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Preface 



This volume contains the papers presented at SWAT 2004, the 9th Scandina- 
vian Workshop on Algorithm Theory, which was held on July 8-10, 2004, at the 
Louisiana Museum of Modern Art in Humlebaek on the 0resund coast north of 
Copenhagen. The SWAT workshop, in reality a full-fledged conference, has been 
held biennially since 1988 and rotates among the five Nordic countries, Den- 
mark, Finland, Iceland, Norway, and Sweden. The previous meetings took place 
in Halmstad (1988), Bergen (1990), Helsinki (1992), Arhus (1994), Reykjavik 
(1996), Stockholm (1998), Bergen (2000), and Turku (2002). SWAT alternates 
with the Workshop on Algorithms and Data Structures (WADS), held in odd- 
numbered years. 

The call for papers invited contributions on all aspects of algorithm theory. A 
total of 121 submissions was received — an overall SWAT high. These underwent 
thorough reviewing, and the program committee met in Copenhagen on March 
20-21, 2004, and selected 40 papers for presentation at the conference. The 
program committee was impressed with the quality of the submissions and, given 
the constraints imposed by the choice of conference venue and duration, had 
to make some tough decisions. The scientific program was enriched by invited 
presentations by Gerth Stplting Brodal (University of Aarhus) and Charles E. 
Leiserson (Massachusetts Institute of Technology) . 

Two satellite events were held immediately before SWAT 2004: the Workshop 
on On-Line Algorithms (OLA 2004), organized by members of the Department 
of Mathematics and Computer Science at the University of Southern Denmark, 
and the Summer School on Experimental Algorithmics, organized by the Perfor- 
mance Engineering Laboratory in the Department of Computing at the Univer- 
sity of Copenhagen. More information about SWAT 2004 and its satellite events 
is available at the conference web site http://swat.diku.dk/. 

We thank all of the many persons whose efforts contributed to making SWAT 
2004 a success. These include the members of the steering, program, and or- 
ganizing committees, the invited speakers, the authors who submitted papers, 
the numerous referees who assisted the program committee, and the support 
staff at the conference. Our special thanks go to Frank Kammer (University of 
Augsburg), who maintained the submission server, and Sebastian Berg and his 
associates (Less is more ApS) , who implemented and hosted the web-based con- 
ference shop. We also thank our institutional and industrial sponsors for their 
support. 
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Design and Analysis 
of Dynamic Multithreaded Algorithms 



Charles E. Leiserson* 

MIT Computer Science and Artificial Intelligence Laboratory 
Cambridge, Massachusetts 02139, USA 



Dynamic multithreaded languages provide low-overhead fork-join primitives to 
express parallelism. One such language is Cilk [3,5], which was developed in the 
MIT Laboratory for Computer Science (now the MIT Computer Science and 
Artificial Intelligence Laboratory). Cilk minimally extends the C programming 
language to allow interactions among computational threads to be specified in a 
simple and high-level fashion. Cilk’s provably efficient runtime system dynami- 
cally maps a user’s program onto available physical resources using a randomized 
“work-stealing” scheduler, freeing the programmer from concerns of communi- 
cation protocols and load balancing. 

Cilk provides an abstract performance model for algorithmic analysis. This 
model characterizes the performance of a multithreaded algorithm in terms of 
two quantities: its work T\, which is the total time needed to execute the com- 
putation serially, and its critical- path length Too, which is its execution time 
on an infinite number of processors. Cilk provides instrumentation that allows a 
user to measure these two quantities. Cilk’s scheduler executes a Cilk computa- 
tion on P processors in expected time 

Tp = Ti/P + 0(Too) , 

assuming an ideal parallel computer. This equation resembles “Brent’s theorem” 
[1,4] and is optimal to within a constant factor, since Ti/P and Too are both 
lower bounds. 

The fork-join primitives of a dynamic multithreaded language encourage the 
design of divide-and-conquer parallel algorithms. Consequently, recurrences are 
a natural way to express the work and critical-path length of a multithreaded al- 
gorithm. Recurrence analysis can be used to develop good algorithms for matrix 
multiplication, sorting, and a host of other problems. 

The algorithmic technology behind dynamic multithreading has been ap- 
plied to develop MIT’s championship computer-chess programs, *Socrates and 

* Support for this research was provided in part by the Defense Advanced Research 
Projects Agency (DARPA) under Grant F30602-97-1-0270, by the National Science 
Foundation under Grants EIA-9975036 and AGI-032497, and by the Singapore-MIT 
Alliance. 
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Cilkchess [2]. The methodology of analyzing work and critical-path length pro- 
vides a clean theoretical model which works in practice. 

See http://supertech.csail.mit.edu/cilk for more background on Cilk 

and to download the Cilk manual and software release. 
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Cache-Oblivious 

Algorithms and Data Structures 



Gerth St0lting Brodal* 
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Abstract. Frigo, Leiserson, Prokop and Ramachandran in 1999 intro- 
duced the ideal-cache model as a formal model of computation for devel- 
oping algorithms in environments with multiple levels of caching, and 
coined the terminology of cache- oblivious algorithms. Cache-oblivious 
algorithms are described as standard RAM algorithms with only one 
memory level, i.e. without any knowledge about memory hierarchies, 
but are analyzed in the two-level I/O model of Aggarwal and Vitter for 
an arbitrary memory and block size and an optimal off-line cache re- 
placement strategy. The result are algorithms that automatically apply 
to multi-level memory hierarchies. This paper gives an overview of the 
results achieved on cache-oblivious algorithms and data structures since 
the seminal paper by Frigo et al. 



1 Introduction 

Modern computers are characterized by having a memory system consisting of 
a hierarchy of several levels of memory, where each level is acting as a cache for 
the next level [46]. The typical memory levels of current machines are registers, 
level 1 cache, level 2 cache, level 3 cache, main memory, and disk. While the sizes 
of the levels increase with the distance from the CPU the access times to the 
levels also get larger, most dramatically when going from main memory to disk. 
To circumvent dramatic performance loss data is moved between the memory 
levels in blocks (cache lines or disk blocks). As a consequence of this organization 
of the memory, the memory access pattern of an algorithm has a major influence 
on its practical running time. A basic rule commonly stated in the literature for 
achieving good running times is to ensure locality of reference in the developed 
algorithms. 

1.1 I/O Model 

Several models have been proposed in recent years to model modern memory 
hierarchies. The most successful of these models (in terms of number of publica- 
tions) is the two-level I/O model introduced by Aggarwal and Vitter in 1988 [6]: 

* Supported by the Carlsberg Foundation (contract number ANS-0257/20). 

** Basic Research in Computer Science, www.brics.dk, funded by the Danish National 
Research Foundation. 
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The memory hierarchy is assumed to consist of two levels, a main memory of 
size M and an infinite secondary memory, where data is transfered between the 
two levels in blocks of B consecutive elements. Computations are performed 
on elements in main memory and algorithms have complete control over block 
transfers, I/Os, between the two levels. The resource studied in the I/O model 
is the number of I/Os performed by algorithms, e.g. does the scanning of an 
N element array in secondary memory imply 0{N / B) I/Os. Aggarwal and Vit- 
ter in their seminal paper [6] proved that in the I/O model, comparison based 
sorting requires 6>(SortM.s(IV)) = ©(^logjvf/s I/Os, which is achieved by 
Theta{^)-&vy multi-way mergesort, and searching requires 6>(logg N) I/Os, 
which is acheived by B-trees [17]. 

The success of the I/O model is likely due to its simplicity making the design 
and analysis of external memory algorithms feasible, while adequately modeling 
the case where the I/Os between two levels of the memory hierarchy dominates 
the running time. For an overview of the comprehensive work done related to 
the I/O model we refer the reader to the surveys by Arge [9] and Vitter [69], 
and the book [57]. 

More sophisticated multi level models have been studied in the literature [3,4, 
5,7,16,47,62,63,70,71], but none of these have gained the same level of attention 
as the I/O model of Aggarwal and Vitter [6], likely due to the complexity of 
describing algorithms for these models. 

A limitation of the I/O model is that the parameters B and M are required 
to be known to the algorithms. In practice, these parameters might not always 
be available. Furthermore the available memory for a process may wary over 
time, e.g. in a multiprocess environment the available memory depends on the 
memory usage of the other processes being scheduled. 



1.2 Ideal-Cache Model 

Frigo, Leiserson, Prokop and Ramachandran in 1999 introduced the ideal-cache 
model and coined the terminology of cache-oblivious algorithms [44]. The ideal- 
cache model can be viewed as a formal framework for analyzing the locality of 
reference of an algorithm that is oblivious about the presence of the memory 
hierarchy. The basic idea is to describe algorithms for the standard RAM model 
with only one memory level, i.e. without any knowledge about memory hierar- 
chies. The algorithms are then analyzed in the two- level I/O model of Aggarwal 
and Vitter for an arbitrary memory size M and block size B, assuming that 
I/Os are performed by an optimal off-line cache replacement strategy. Since the 
analysis of an cache oblivious algorithm should hold for all values of M and B, 
the analysis also holds for all levels of a multi-level memory hierarchy (see [44] 
for a detailed discussion of the technical requirements to be satisfied) . 

For algorithms satisfying that reducing the cache size by a factor two does 
not increase the number of I/Os by more than a constant factor, Frigo et al. [44] 
proved that the assumption of an optimal off-line cache replacement strategy can 
be replaced by the on-line least-recently used (LRU) cache replacement strategy. 
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by appealing to Sleator and Tarjan’s classic competitiveness result [64] for LRU- 
paging. Since LRU is adaptive to dynamically changing memory sizes, cache 
oblivious algorithms are also adaptive to changes in the available memory. 

A naive cache-oblivious algorithm is the scanning of an N element array that 
requires optimal 0{N/B) I/Os. The linear time selection algorithm of Blum et 
al. [27] primarily is based on scanning and it can be proved that their selection 
algorithm is an optimal cache-oblivious algorithm performing 0{N/B) I/Os. 

Frigo et al. in their seminal paper [44] considered cache-oblivious algorithms 
for several algorithmic problems: The transposition of an n x m matrix was solved 
using optimal 0{mn/B) I/Os. The multiplication of an m x n-matrix and an 
n X p-matrix was solved using 0{{mn + np + mp) / B + mnp/{B'/M)) I/Os. For 
square matrices this matches a lower bound by Hong and Kung [47] for algo- 
rithms computing the matrix product only using additions and multiplications. 
In [44] it was furthermore proved that Strassen’s matrix multiplication algo- 
rithm [65] is cache-oblivious and requires 0{n + rv^ / B + /{B\/l^d)) I/Os. 

Optimal comparison based sorting algorithms performing O (Sort (IV)) I/Os were 
presented, under the so called tall cache assumption M = 17(R^). Both merging 
based (Funnelsort) and distribution based sorting algorithms were presented. 
Finally an algorithm for fast Fourier transform (FFT) was presented requir- 
ing 0(Sort(fV)) I/Os. A cache-oblivious algorithm for LU decomposition with 
pivoting appeared in [66]. 

The remaining of this paper gives an overview of the results on cache- 
oblivious algorithms and data structures achieved during the five years since 
the seminal paper by Frigo et al. Recent surveys on cache-oblivious algorithms 
and data structures can also be found in [13,38,50]. 

2 Sorting and Permuting 

The first cache-oblivious sorting algorithms were presented by Frigo et al. [44]: 
One based on the merging paradigm, Funnelsort, and one based on the distri- 
bution paradigm. Both algorithms require the tall cache assumption M > B^. 
A simplified version of Funnelsort was presented in [28], denoted Lazy Funnel- 
sort, requiring the tall cache assumption M > B^~^^ . An empirical study of the 
developed cache-oblivious sorting algorithms is presented in [33]. 

That I/O optimal cache-oblivious comparison based sorting is not possible 
without a tall cache assumption is proved in [30] . The paper shows an inherent 
trade-off for cache-oblivious algorithms between the strength of the tall cache 
assumption and the overhead for the case M ^ B. The result implies that both 
Funnelsort and recursive binary mergesort are optimal algorithms in the sense 
that they attain this trade-off, where recursive binary mergesort does not require 
a tall cache assumption but performs log 2 I/Os. 

Permuting N elements in an array can be solved by either moving each ele- 
ment independently to its new position by 0(1) I/Os or by sorting the elements 
by their new positions. In [6] it is proved that permuting in the I/O model re- 
quires 6>(min{A^, Sort(A^)}) I/Os. In [30] it is proved that no cache-oblivious 
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algorithm can match this I/O performance, not even in the presence of a tall 
cache assumption. 

Variations of cache-oblivious sorting have been studied. An implicit cache- 
oblivious sorting algorithm was presented in [41], i.e. an algorithm that works 
with a single array of size N only storing the N input elements plus 0{l) machine 
words. Sorting multisets has been studied in [40]. 

3 List Labeling 

Itai et al. [48] studied the problem of maintaining N elements in sorted order in 
an array of length 0{N), an important problem in dynamic dictionaries when 
an efficient range query operation is required to be supported [22] . The problem 
is commonly denoted the list labeling problem, but in [22] denoted the packed 
memory management problem. The reorganization primitive in [48] during in- 
sertions and deletions of elements is the even redistribution of the elements in a 
section of the array. Their approach uses amortized 0(log^ N) work per update. 
A matching f?(log^ N) lower bound for algorithms using even redistribution as 
the primitive was given in [39]. A worst-case variant was developed by Willard 
in [72] . Bender et al. [22] adapted the algorithms to the cache oblivious setting, 
supporting insertions and deletions in the array in amortized 0{{log^N)/B) 
I/Os, and guaranteeing that there are only 0{1) empty slots between two con- 
secutive elements in the array. Bender et al. [18] refined the last labeling solution 
to satisfy the property that every update (in addition to every traversal) con- 
sists of 0{\) physical scans sequentially through memory. Updates still require 
amortized N)/B) I/Os. 

4 Search Trees 

Prokop in [60] proposed static cache-oblivious search trees with search cost 
0(logg N) I/Os, matching the search cost of standard (cache-aware) B-trees [17]. 
The search trees of Prokop are related to a data structure of van Emde Boas [67, 
68], since the recursive layout of a search tree generated by Prokop’s scheme re- 
sembles the layout of the search trees of van Emde Boas. The constant in the 
0{\og^N) search cost was studied in [21], where it is proved that no cache- 
oblivious algorithm can achieve a performance better than log 2 e • log^ N I/Os, 
i.e. a factor « 1.44 slower than a cache-aware algorithm. Cache oblivious search 
trees avoiding the usage of pointers were presented in [31,53,59]. 

Dynamic B-trees were first presented by Bender et al. [22] achieving searches 
in 0{logg N) I/Os and updates requiring amortized 0(log^ N) I/Os. Simplified 
constructions were presented in [23] and [31], where [31] is based on combining 
the recursive static layout of Prokop [60] and the dynamic search trees of low 
height by Andersson and Lai [8], and [23] is based on combining the static layout 
of Prokop with a data structure for the list labeling problem. 

A cache-oblivious dictionary based on exponential search trees was presented 
in [19]. The paper shows how to make the exponential search trees partially 
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persistent, i.e. support queries in previous versions of the search tree, and how to 
support efficient cache-oblivious finger searches, i.e. searches in the vicinity of a 
given element. The layout of arbitrary static trees was considered in [20]. Finally, 
optimal cache-oblivious implicit dictionaries were developed in [42] and [43]. 



5 Priority Queues 

Arge et al. [11] presented the first cache-oblivious priority, supporting inserts and 
delete-min operations in 0{j^logj^^g I/Os. This matches the performance 
achieved in the I/O model by e.g. the buffer trees of Arge [10]. The construction 
in [11] is a general reduction to sorting. An alternative cache-oblivious priority 
achieving the same I/O complexity as [11] was presented in [29]. This solution 
is a more direct solution based on fc-mergers introduced in the Funnelsort algo- 
rithm [44,28]. 



6 Graph Algorithms 

The existence of a cache-oblivious priority queue enabled a sequence of cache- 
oblivious graph algorithms. In [11] the following deterministic cache-oblivious 
bounds are obtained: List ranking, computing the Euler tour of a tree, breadth 
first search (BFS) of a tree, and depth first search (DFS) of a tree, all requiring 
0(Sort(if)) I/Os, matching the known bounds for the I/O model achieved in [36]. 
For directed BFS and DFS on general graphs a cache-oblivious algorithm was 
presented performing 0{{V + E/B) log V + Sort(A)) I/Os, matching the known 
best bounds for the I/O model [34] . For undirected DFS, an algorithm performing 
0{V + Sort(A)) I/Os was achieved, matching the bound for the I/O model 
in [58]. Finally an 0(Sort(if) loglog E) I/O minimum spanning tree algorithm 
was presented, nearly matching the 0(Sort(if) log log I/O bound in [12] for 

the I/O model. 

Abello et al. [1] presented for the I/O model a functional approach to solve a 
sequence of graph problems based on recursion and repeated use of sorting and 
scanning. Their randomized minimum spanning tree algorithm is actually also 
cache-oblivious and performs expected 0(Sort(if)) I/Os. 

In [56] it was shown how to solve undirected BFS in 0(ST(if) -|- Sort(A) -|- 
^JVE/B) I/Os for the I/O model, where ST(A) denotes the I/O bound for 
computing a spanning tree of the graph. In [32] two cache-oblivious versions of 
the algorithm in [56] were developed requiring C)(ST(if) -|- Sort(A) -|- ^ logE -I- 
y/VE/B) and 0{ST{E) + Sort{E) + § ■ EloglogV + ^/VE/ B ■ ^/VB / E^") I/Os 
respectively. 

Undirected single source shortest path (SSSP) can be solved cache-obliviously 
in 0{V + E / B log{E / B)) I/Os [32,37], matching the known bounds for the I/O 
model [51]. 
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7 Computational Geometry 

Goodrich et al. [45] introduced the distribution sweeping approach to solve a 
sequence of problems within computational geometry in the I/O model. A cache- 
oblivious version of the distribution sweeping approach is developed in [28], 
achieving the following results, where N is the input size, T the output size: 
The 3D maxima problem on a set of points, computing the measure of a set 
of axis-parallel rectangles, the all nearest neighbors problem, and computing 
the visibility of a set of non-intersecting line segments from a point can be 
solved using optimal 0(Sort(fV)) I/Os. The orthogonal line segment intersection 
reporting problem, batched orthogonal range queries, and reporting pairwise 
intersections of axis-parallel rectangles can be solved using optimal 0(Sort(A^) -I- 
i) I/Os, 

A cache-oblivious data structure for the planar point location problem was 
presented in [19]. In requires linear space, taking optimal (^(log^A^) I/Os for 
point location queries, where N is the number of line segments specifying the 
partition of the plane. The pre-processing requires 0{{N/B) log^/g N) I/Os. 

Cache-oblivious algorithms for orthogonal range searching were presented 
in [2], both a kd-tree and range-tree solution were presented. A cache-oblivious 
kd-tree is simply a normal kd-tree [24] laid out in memory using the van 
Emde Boas layout. This structure uses linear space and answers queries in 
0{y^N/B + I/Os; this is optimal among linear space structures [49]. Inser- 
tions are facilitated using the so-called logarithmic method of Bentley [25] , and 
require IoSm/b I/Os. The cache-oblivious range-tree presented in [2] 

supports range queries in 0(logg N + ^) I/Os and requires space 0(A^log^ N). 

8 Lower Bounds 

A general reduction technique for proving lower bounds for comparison based 
algorithms for the I/O model was presented in [15], allowing the reduction to 
standard comparison trees. Lower bounds achieved for the I/O model immedi- 
ately apply to cache-oblivious algorithms also. 

Bilardi and Peserico [26] have investigated the portability of algorithms across 
memory hierarchies in the HRAM-model, where they provide a CDAG compu- 
tation and two machines such that any scheduling of the computation is a factor 
polynomial from optimal on at least one of the machines. For cache-oblivious 
algorithms lower bounds have been given for searching [21], and sorting and 
permuting [30]. 

9 Empirical Work 

The impact of different memory layouts for data structures has been studied 
before in different contexts. In connection with matrices, significant speedups 
can be achieved by using layouts optimized for the memory hierarchy — see e.g. 
the paper by Ghatterjee et al. [35] and the references it contains. 
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Ladner et al. considered the effect of caches in connection with heaps [54], 
sorting [55], and sequential and random traversals [52]. Using registers to im- 
prove the running time of sorting was considered in [14]. Minimizing translation 
look-aside buffer (TLB) misses, and the case of low cache associativity was stud- 
ied in [73]. Rahman et al. [61] made an empirical study of the performance of 
various search tree implementations, with focus on showing the significance of 
minimizing TLB misses. Brodal et al. [31] studied different memory layouts for 
near perfect-balanced search trees. Ladner et al. [53] gave a comparison of cache 
aware and cache-oblivious static search trees using program instrumentation. 
Empirical investigations of the practical efficiency of cache-oblivious algorithms 
for sorting was done in [33], 

The overall conclusion of these investigations is that cache-oblivious methods 
often outperform RAM algorithms, but not always as much as algorithms tuned 
to the specific memory hierarchy and problem size. On the other hand, cache- 
oblivious algorithms perform well on all levels of the memory hierarchy, and seem 
to be more robust to changing parameter sizes than cache-aware algorithms. 



10 Summary 

Since the seminal paper by Frigo et al. [44] in 1999 an amazing sequence of 
papers has been published on various cache-oblivious problems and data struc- 
tures, establishing cache-oblivious algorithms as an important subfield of exter- 
nal memory algorithms. Empirical work has documented the soundness of the 
cache-oblivious approach. The level of success (in terms of number of publica- 
tions) as for the I/O model of Aggarwal and Vitter has not been achieved yet for 
the cache-oblivious model, likely due to the complexity of the algorithm descrip- 
tions: The ideal-cache model forces the logical structure of an cache-oblivious 
algorithm in most cases to be more complex than the structure of a correspond- 
ing algorithm for the I/O model. 
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Abstract. We consider the bi-criteria problem of minimizing the aver- 
age flow time (average response time) of a collection of dynamically re- 
leased equi-work processes subject to the constraint that a fixed amount 
of energy is available. We assume that the processor has the ability to 
dynamically scale the speed at which it runs, as do current microproces- 
sors from AMD, Intel, and Transmeta. We first reveal the combinatorial 
structure of the optimal schedule. We then use these insights to devise a 
relatively simple polynomial time algorithm to simultaneously compute, 
for each possible energy, the schedule with optimal average flow time 
subject to this energy constraint. 



1 Introduction 

Since the early 1970’s the power consumption of the most common micropro- 
cessors has increased by a factor of 3 approximately every 10 years [1]. Power 
consumption has always been a critical issue in portable platforms such as cell 
phones and laptops with limited batteries (as anyone who has taken their laptop 
on a long flight knows). Now power consumption has become a critical issue in 
more traditional settings. A 1998 estimate by the Information Technology In- 
dustry Council is that computers consume 13% of the electrical power generated 
in the United States [1]. Further, recent estimates are that energy costs are 25% 
of the total cost of operating a server farm [5] . Limiting power consumption has 
become a first-class architectural design constraint in almost all settings [5] . 

Several strategies have been proposed to limit power consumption in micro- 
processors. Here we focus on the highest profile strategy: dynamic scaling of 
speed (or voltage or power). Currently the dominant component of microproces- 
sor power usage is switching loss [2] . The switching loss is roughly proportional 
to V^f, the voltage squared times the frequency. But V and / are not inde- 
pendent. There is a minimum voltage required to drive the microprocessor at 
the desired frequency. This minimum voltage is approximately proportional to 
the frequency [2]. This leads to the well known cube-root rule that speed (and 
equivalently, frequency) is roughly proportional to the cube-root of the power, 
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or equivalently, that the power is proportional to the speed cubed (or frequency 
cubed) [2]. Current microprocessors from AMD, Intel and Transmeta allow the 
speed of the microprocessor to be set dynamically. For example, a Pentium-Ill 
processor uses 9 Watts when run at 500 megahertz but uses 22 Watts when run 
at 650 megahertz [3] . Note this roughly comports with what is predicted by the 
cube-root rule. The speed of the microprocessor can generally be controlled by 
the microprocessor, the operating system, or a user application. 

One natural question is then what policy should be used to set the speed, 
say by the operating system. The speed setting policy is symbiotically related to 
the scheduling policy for determining which process to run. This is a bi-criteria 
optimization problem. The OS wants to both optimize some Quality of Service 
(QoS) measure that it provides to the applications, and to minimize the energy 
that it uses. The way in which this problem has been formalized in the literature 
[7,4] is to assume that the processes have deadlines, and then find the minimum 
energy schedule where all deadlines are met. However, in general computational 
settings, most processes do not have natural deadlines associated with them, 
which is why operating systems like Unix and Windows do not have deadline 
based schedulers. By far the most commonly used QoS measure in the computer 
systems literature is average flow time (average response time), which is the 
average over all processes, of the time that that process has to wait between 
when it is released until the time that is was completed. 

Thus, in this paper we initiate the study of the bi-criteria problem of min- 
imizing average flow time and minimizing energy usage. Note that these two 
criteria are in opposition, that is, increasing energy usage will decrease average 
flow time. The simplest way to formalize a bi-criteria optimization problem is 
to fix one parameter and to optimize the other parameter. For this problem it 
seems logical to us to initially fix the available energy, and then to minimize 
average flow time. This is certainly most logical in a setting such as a laptop 
where energy is provided by a battery. In this paper, we restrict our attention 
to the case that all processes have the same amount of work. The most impor- 
tant reason for our adoption of this restriction is that it lets us decouple the 
scheduling policy from the speed setting policy, and lets us concentrate our ef- 
forts on understanding speed scheduling. In the case of equi-work jobs, it is easy 
to see that the optimal scheduling policy is First-Come-First-Served. We assume 
that power is proportional to the speed to some power a, which generalizes the 
cube-root rule. 

We first reveal the combinatorial structure of the optimal schedule. This 
structure has some surprising aspects to it. For example, intuitively the less en- 
ergy available, the slower that jobs should be run in the optimal schedule. But we 
show that in fact this intuition is not always correct, that is, as energy decreases 
some jobs will actually be run at a higher speed in the optimal schedule. So none 
of the obvious properties (speed, energy used, etc.) of a job are monotone func- 
tions of energy. Fortunately, we are saved by the fact that essentially everything, 
including the underlying schedules, is a continuous function of energy. It also 
helps that over all possible amounts of available energy, that there are only lin- 
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early many possible structurally different schedules. Using these facts we obtain 
an 0{n^ log L) time algorithm to find all of these schedules, as well as the energy 
range where each schedule is optimal. Here L is range of possible energies divided 
by the precision that we desire. Because this problem is highly non-linear, we 
can get solutions where the variables are not only irrational, but don’t appear 
to have any simple short representation. In this paper, we will brush this finite 
precision issue under the carpet. Handling this issue is rather orthogonal to the 
combinatorial issues in which we are interested. Also as a side product, these 
schedules can also be used to compute the schedule that optimizes energy usage 
subject to a constraint on average flow time. 

One of our students, Aleksandar Ivetic, implemented a variation of our al- 
gorithm in Mathematica, and created a GUI in Java. The GUI lets you enter 
an arbitrary instance. The GUI also has a slider that lets change the available 
energy, and view the resulting optimal schedule. By moving the slider, you essen- 
tially get a movie of the evolving optimal schedule. The software can be found 
at http : //www . cs .pitt . edu/~utp/energy. 

We believe that this paper should be viewed as an initial investigation into a 
research area where there appear to be many combinatorially interesting prob- 
lems with real application. We discuss this further in the conclusion. 



1.1 Related Results 

The way in which this problem has been formalized in the literature [7,4] is to 
assume that the processes have deadlines, and then find the minimum energy 
schedule where all deadlines are met. In [7] a polynomial-time offline algorithm 
for computing the optimal schedule is presented. Further [7] gives an online 
algorithm with constant competitive ratio. In [4] a variation on this problem is 
considered. There it is assumed that the processor can be put into a lower-power 
sleep state, and that bringing the processor back to the on state requires a fixed 
amount of energy. The paper [4] gives a polynomial-time offline algorithm with 
an approximation ratio of 3, and an online algorithm with a constant competitive 
ratio. 

2 Definitions, Notation, and Preliminaries 

The input consists of n jobs, referenced by the integers from 1 to n. The jobs 
will be processed on one machine. We normalize the unit of work so that each 
job has unit work. Job i arrives at its release time rt. We will assume that the 
jobs are labeled so that ri < r 2 < ... < r„. Note that we assume no pair of 
jobs have identical release dates to ease the exposition. At the end we’ll explain 
how the result can be easily generalized to the case that jobs can have identical 
release dates. A schedule specifies, for each time, the job that is run, and the 
speed at which that job is run. If a job i is run at speed s for t units of time, 
then s ■ t units of work are completed from i. The job i is completed at the time 
Ci when all of its work has been completed. Let Fi = Ci — r* be the flow time 
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of job i, and F = ~ g) be the total flow time. There is a bound A on 

the amount of available energy. If a job i is run at speed s then we assume the 
power (energy used per unit time) is s“. We assume throughout the paper that 
a > 1. A schedule is feasible at energy level A if it completes all jobs, and the 
total amount of energy used is no more than A. We say that a feasible schedule 
is optimal for energy level A if it has minimum total flow time (or equivalently, 
minimum average flow time) among all feasible schedules at energy level A. 

It is easy to see that the optimal schedule runs jobs in First-Come-First- 
Served order. Thus, without loss of generality, we may only consider schedules 
where Ci < C 2 < ... < C„. Further, there is always an optimal schedule that runs 
each job at a fixed speed. This was noted in [7], and immediately follows from 
the convexity of the speed to power function. Therefore, to specify an optimal 
schedule for this problem, it suffices to specify a speed Si for each job i. Let 
Xi = 1/ Si be the time that job i is processed, Pi = s“ be the power for job i, 
and Cj = XiPi = sf~^ be the energy used by the ith job. Let E = Ci be the 
total energy used. 

Let p = s“ be the power of the nth job, which we will see will play an im- 
portant role in our results. To refer to a particular schedule S when the schedule 
in question is not clear from context, we append (S') to our notation, so F{S) is 
the total flow time for schedule S. 

If the problem is not yet clear to the reader, we refer him/her to the example 
instance that we use to illustrate our results in Section 3. 



3 Our Results 



3.1 Expressing the Problem as a Convex Program 

We note that the problem of minimizing total flow time subject to an energy 
constraint can be expressed as the following convex program CP: 



n 

minimize C = Ci subject to 

Z=1 



E 



„a — 1 



< 



A 



Xi = Ci — max{ri, Ci-i} fori=l,...,n 
Ci > Vi for z = 1, ..., n 
Ci < Ci+i for z = 1, ..., n 



We assume that there are dummy variables Cq = 0 and Cn+i = 00. While CP is 
not strictly speaking convex, it can be made convex by introducing new variables 
ti, and the constraints U > Vi, ti > Ci-i, Xi = — U, and Xi > 1/A. Thus this 

problem can be solved in polynomial time by the Ellipsoid algorithm since the 
gradients of the constraint and objective functions are efficiently computable [6] . 
However, this would only solve the problem for one energy bound, would be quite 
complex, and is not guaranteed to run in lower order polynomial time. 
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3.2 Configuration Curves 

The program CP prompts us to define what we call configurations. A configu- 
ration (j) maps each job i, 1 < i < n — 1, to a, relation in {<, =, >}. A schedule is 
in configuration </> if Ci 4>(i) for all jobs i. So for example if is <, then 
Ci < Define a configuration curve M^{A) to be a function that takes as 

input an energy bound A and outputs the optimal schedule (or polymorphically 
the total flow time for this schedule), among those schedules in configuration </> 
that use less than A units of energy. Note that M^{A) may not be defined for 
all A. Let <P be the set of all configurations. Define M{A) as 

M{A) = min MAA). 

Our goal is to compute M{A), which is the lower envelope of the exponentially 
many configuration curves. The problem is non-trivial because (1) it is non- 
trivial to determine the optimal configuration for a given amount of energy, and 
(2) even if the optimal configuration is known, it is non-trivial to determine how 
the energy should be distributed to the n jobs to minimize total flow time. 

As an example, consider the 3 job instance with release times 0, 1 and 3, and 
a = 3. There are five configuration curves that are part of M{A). The configu- 
ration curves for the configurations >X, ><< and <<< are shown in Figure 
1 from left to right in this order. Note that configuration curves are in general 
only defined over a subrange of possible energies. The superposition of all five 
configuration curves that make up M{A) are show in Figure 2. We recommend 
viewing these figures in color. The corresponding configurations/schedules are 
shown in Figure 3. Note that in Figure 3 that the job released at time 1 is run 
at faster speeds as energy decreases from 1.282 to 1.147. 




Fig. 1. The configuration curves for the configurations 



and 
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Fig. 2. The superposition of the five configuration curves that make up M{A). 




Fig. 3. Optimal schedules at different amounts of available energy 



3.3 Basic Algorithmic Strategy 

We are now ready to describe our basic algorithmic strategy. Intuitively we trace 
out the lower envelope M{A) of the configuration curves. We start with A being 
large and then continuously decrease A. When A is sufficiently large then it is 
easy to see that in the optimal schedule Ci < r^+i for all i, and that all jobs are 
run at the same speed. As we decrease A we need to explain: 

— how to compute M^{A) for any given configuration (j>, 

— how to recognize when the optimal configuration changes from a configura- 
tion 0 to a configuration </)', and 

— how to find the new optimal schedule when we switch configurations. 

If we can accomplish there goals then we could trace out M (A) by continuously 
decreasing A, or equivalently (j). 
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Actually, rather than working with the energy bound A, it will prove more 
convenient to work with the power p of job n. We will eventually show in Lemma 
8 that moving continuously from a high p to a low p is equivalent to continuously 
moving from a high A to a low A. Eventually we will explain how to make this 
algorithm efficient by discretely changing A and p. 

3.4 How to Find the New Optimal Schedule When We Switch 
Configurations 

We tackle this last goal first by showing that the optimal schedule is unique for 
each energy. Thus when two configuration curves intersect on the lower envelope 
for some energy A, we know that the underlying schedules are identical. Thus the 
optimal schedule for the new configuration is the same as the optimal schedule 
for the old configuration. 

Lemma 1. The optimal schedule is unique given that a > 1. 

Proof. Assume to reach a contradiction that you have two different optimal 
schedules S and T. Consider a third schedule U where for all jobs i, Xi{U) = 
{x^{S) + x,{T))/2. We now claim that F{U) < {F{S) + F{T))/2 = F{S) = F{T) 
and E{U) < {E{S) + E(T))/2. From this it follows that neither S' or T is optimal 
since one can get a better schedule by reinvesting A — E(U) energy into job n 
in U to get a schedule with better flow time than E{U). This contradicts the 
optimality of S and T. 

To see that E{U) < (E{S) + E{T))/2 = E{S) = F{T) consider a particular 
job b. Then there exists some job a such that Cb{U) = Therefore 

by the definition of U, Cb{U) = To + + ^i{T))/2. But in S it must 

be the case that Cb{S) >Va + Yl\=a since S must process jobs a through b 
between time and Cb{S). Similarly, Cb{T) > averaging 

these two equations, (Cb{S) + Cb{T))/2 > + 2;i(r))/2. We 

know the righthand side of this inequality is exactly Cb{U). Hence, (C'h(S') + 
Cb{T))/2 > Cb{U). Since b was arbitrarily chosen, it follows by summing that 
E{U) < {E{S) + F{T))/2. 

Note that the function f{x) = is a convex function when a > 1, and 
/(^^) < (/(®) + /(^))/2- It then immediately follows that E{U) < (E{S) + 
E{T))/2 on a job by job basis since ei{U) = p^.(^s)+Xi{T))/ 2 )'=‘-^ > (ei(-S') + 

ei(T))/2 = 

3.5 How to Compute 

We now consider our second goal. That is, the problem of, given an energy A, 
and a fixed configuration </>, finding the optimal schedule among those schedules 
with configuration (f and energy at most A. Actually, we restate this problem as: 
given a power p, and a fixed configuration (f, finding the optimal schedule among 
those schedules with configuration (j) and power p on job n. We define a group 
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as a maximal substring of jobs where is >. More precisely a, a + 1, . . . , & is 
a group if 

— a = 1, or (j){a — 1) is not >, and 

— for z = a , & — 1 it is the case that (j){i) is >, and 

— 6 = n, or (j){b) is not >. 

This group is open if (j){b) is <, and this group is closed if 4>{b) is =. 

Lemmas 2,3, and 4 establish a relationship between speed Si and speed 
depending on whether Cj > G+i, Ci < G+i, or Ci = Tj+i. 



Lemma 2. If i is a job in the optimal schedule for energy A such that Ci > rz+i, 
then sf = p + s“+i. 

Proof. Let e be a small number such that Ci + e > r^+i . Note that e is allowed to 
be either positive or negative. Consider the result of increasing Xi by e, decreasing 
Xi+i by e, and decreasing x„ by e. This does not change the total completion 
time as Ci increases by e, Cn decreases by e, and Ci+i and all other Cj’s remain 
unchanged. The change in the energy used, AE(e), is 

~r c vL^-i-l c c jy'n 

Since the optimal schedule is unique (Lemma 1), AE{e) must be positive. Oth- 
erwise, we could reinvest the energy saved by this change to obtain a schedule 
with a better total completion time. Hence the derivative AE'{e) evaluated at 
e = 0 must be 0. 



AE\e) 



— {a — 1) a — 1 
(xj -h e)“ {xi+i - e)“ 



a — 1 
(x„ - e)“ 



(2) 



Substituting e = 0 and solving for AE'{0) = 0 we get that ^ ^ or 

i rt i+1 

equivalently, = s“ -I- 



Lemma 3. If i is a job in the optimal schedule for energy A such that Ci < G+i, 
then sf = p. 

Proof. Let e be a small number such that Cj -I- e < r^+i. Note that e is allowed 
to be either positive or negative. Consider the result of decreasing Xi by e, 
and increasing x„ by e. This does not change the total completion time as Ci 
decreases by e and C„ increases by e, and all other Cj’s remain unchanged. The 
change in the energy used, AE{e), is 

+ (3) 



Since the optimal schedule is unique (Lemma 1), AE{e) must be positive. Oth- 
erwise, we could reinvest the energy saved by this change to obtain a schedule 
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with a better total completion time. Hence the derivative AE'{e) evaluated at 
e = 0 must be 0. 



AE\e) 



a — 1 —{a — 1) 

{Xi - e)“ {Xn + e)“ 



(4) 



Substituting e = 0 and solving for AE'{0) = 0 we get that ^ ^ or equiva- 

lently, s“ = s“. 



Lemma 4. If i is a job in the optimal schedule for energy A such that Ci = Tj+i, 
then p< sf < p + . 

Proof. First we show that s“ < -I- sf_^i Assume to reach a contradiction that 

there is an optimal solution with —sf + sfj^^ -I- s“ < 0. Consider the transfor- 
mation that increases Xi by a small e > 0, and that decreases Xi+\ by e, and 
decreases by e. This does not change the total completion time. We now argue 
that this decreases the energy used. This is sufficient because one could then get 
a contradiction by reinvesting this saved energy into the last job to improve the 
total completion time. Note that the transformation will bring us into a new 
configuration because Ci + e > Ci = ri+i. The change in energy used, AE{e), 
is then given by Equation (1), and its derivative, AE'(e), is given by Equation 
(2). Substituting e = 0 we get that 



AE'{0) = - 



a — 1 a — 1 a — 1 



‘'&+1 



= (a - -f Sfo+i + s“) < 0 



where the inequality follows from the assumption above. Thus, we have a con- 
tradiction as claimed. 

Next we show that sf < sf. Assume to reach a contradiction that there is an 
optimal solution with s“ — s“ < 0. Consider the transformation that decreases 
Xi by a small e > 0, increases by e. This does not change the total completion 
time. We now argue that this decreases the energy used. This is sufficient because 
one could then get a contradiction by reinvesting this saved energy into the last 
job to improve the total completion time. Note that the transformation will bring 
us into a new configuration because Ci — e < Ci = r^+i. The change in energy 
used, AE{e), is then given by Equation (3), and its derivative, AE'{e), is given 
by Equation (4). Substituting e = 0 we get that 

AE'iO) = ^ + = (a - l)(s“ - <) < 0 

where the inequality follows from the assumption above. Thus, we have a con- 
tradiction as claimed. 



Lemma 5 states how to compute the speeds of the jobs in a group given the 
speed of the last job of the group. 

Lemma 5. If a, a-|-l, . . . , 6 are jobs in a group in the optimal schedule for energy 
level A, then sf = sf + {b — i)p for i = a, ...,b. 
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Proof. This is an immediate application of Lemma 2. 

It should now be clear how to compute the speeds/powers of the jobs a,a + 
1, . . . , 5 in an open group. The power of the last job in the group is p, and the 
powers of the earlier jobs in the group are given by Lemma 5. To compute the 
speeds of the jobs in a closed group is a bit more involved. Lemma 6 establishes 
a relationship between the speed of the speeds of the jobs in a closed group and 
the length of the time period when the jobs in this group are run. 



Lemma 6. // a, a + 1, . . . , & are jobs in a closed group in the optimal schedule 
for energy level A, then (s’^+(b-i)pp/^ = ’"^'+1 “ 

Proof. From the definition of closed group, jobs a, a + 1, ..., b run back-to-back, 
job a starts at time ra, and job b completes at time r^+i. Thus, 



n+i 






1 



b 

? (s“ -I- (6 - 



where the last equality follows from Lemma 5, 

Lemma 6 gives an implicit definition of s;, as a function of p and hence 
of the other sfs in this closed group (using Lemma 5). However, it is hard, 
if at all possible, to determine the closed form for Sf,. Lemma 7 tells us that 
(a°+(fcli)p)i/a strictly increasing as a function of Sf,. Therefore, one can 
determine Sb from p by binary search on Sb. We can then compute the speed of 
other jobs in this closed group using Lemma 5. 

Lemma 7. When p is fixed, Yl\=a (s‘‘+{b-i)pp/°‘ decreases as Sb increases. 

Proof. For i = a, ...,b, as Sb increases, (s“ -I- (& — increases. Thus, \/{s'^ + 
{b - decreases, and so does Y.'l=a (sg+(b-i)p)i/c. ■ 

Finally, we show that continuously decreasing p is equivalent to continuously 
decreasing the energy bound A in the sense that they will trace out the same 
schedules, albeit at a different rate. 

Lemma 8. In the optimal schedules, then energy bound A is a strictly increasing 
continuous function of p, and similarly, p is a strictly increasing continuous 
function of A. 

Proof. We first prove that there is a bijection between p and A in the optimal 
schedules, as well as in the optimal schedules restricted to a particular configu- 
ration. The fact that a fixed p is mapped into a unique energy should be obvious 
from our development to date. That two different p’s can not map to optimal 
schedules with the same energy follows from Lemma 1. 

Since the function from p to H is obviously continuous, it then follows that 
the function from p to H is either strictly increasing or strictly decreasing. The 
fact that function is strictly increasing then follows from looking at the extreme 
points. If A is very large, then the optimal configuration is all <, and p is large. 
If A is very small, then the optimal configuration is all >, and p is small. 
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3.6 How to Recognize When the Optimal Configuration Changes 

We now tackle the second of our three goals: how to recognize when the optimal 
configuration changes from a configuration ^ to a configuration (f)' . The formulas 
for computing the speeds of the jobs in the previous section may not yield a con- 
figuration equal to </>. In particular, this could happen if one of the < constraints 
is violated, or if there is a last job b in some closed group with -I- p. In 

a non-degenerate case, the configuration obtained from p will be different from 
(j) by exactly one constraint ^(f). The next configuration (f)' is obtained from </> 
by changing If is <, it should be changed to =. If 4>{i) is =, it should 
be changed to >. In a degenerate case, all violating </>(z)’s need to be changed. 



3.7 Implementation Details and Time Complexity 

To construct an efficient implementation of this algorithm we need to change 
p in a discrete fashion. This can be accomplished using binary search to find 
the next value for p when the optimal configuration changes. The condition 
for determining whether the current configuration is optimal for a particular 
p, described in the last subsection, can be computed in linear time. Thus in 
something like time O(nlogL) we can find the next configuration curve on the 
lower envelope, where L is something like the range of possible values of p divided 
by our desired accuracy. It is easy to see that there are only 2n — 1 configuration 
curves on the lower envelope as the only possible transitions are from < to =, 
or from = to >. Thus we get a running time of something like 0{n^ log L). 

If there are jobs with equal release dates then the only change that is required 
is that in the initial configuration all jobs with equal release dates are in one 
open group with speeds given by Lemmas 3 and 5. 



4 Conclusions and Future Directions 

We believe that paper suggest several interesting algorithmic and combinato- 
rial research directions. The two most obvious directions are to consider offline 
algorithms for arbitrary length jobs, and to consider online algorithms. 

First let us consider offline algorithms for arbitrary length jobs. A simple 
exchange argument shows that all optimal schedules maintain the invariant that 
they run a job with minimal remaining work. Note that this does not uniquely 
define the job ordering as the remaining work depends on the speed at which 
the jobs were run in the past. While many of our statements carry to the case 
of arbitrary job lengths, many do not. In particular, 

— The most obvious mathematical programs corresponding to CP are no longer 
convex. 

— When a job is broken into multiple pieces due to preemptions, our notion of 
configurations breaks down. See Figure 4. 

— It is not clear that the lower envelope is of polynomial size. 
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Fig. 4. Optimal configurations at different amounts of available energy when jobs have 
unit work (left) and arbitrary amount of work (right) 



— There will be transitions on the lower envelope corresponding to reordering 
of jobs and preemptions. Further optimal schedules can change in a non- 
continuous fashion at these transitions. See Figure 4. 

Of course any algorithm implemented in an operating system must be on- 
line. Although an analysis of online algorithms seems at least a little tricky for 
several reasons. Firstly, it is not quite clear how to most reasonably formalize 
the problem. Secondly, the fact that energy is very non-local makes dealing with 
equal energy schedules a bit messy. Hopefully our revelation of the structure of 
the optimal schedule in this paper will prove useful in this regard. 
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Abstract. In a combinatorial auction k different items are sold to n 
bidders, where the objective of the seller is to maximize the revenue. The 
main difficulty to find an optimal allocation is due to the fact that the 
valuation function of each bidder for bundles of items is not necessarily 
an additive function over the items. An auction with budget constraints is 
a common special case where bidders generally have additive valuations, 
yet they have a limit on their maximal valuation. Auctions with budget 
constraints were analyzed by Lehmann, Lehmann and Nisan [11], as part 
of a wider class of auctions, where they have shown that maximizing the 
revenue is NP-hard, and presented a greedy 2-approximation algorithm. 
In this paper we present exact and approximate algorithms for auctions 
with budget constraints. We present a randomized algorithm with an 
approximation ratio of « 1.582, which can be derandomized. We 
analyze the special case where all bidders have the same budget con- 
straint, and show an algorithm whose approximation ratio is between 
1.3837 and 1.3951. We also present an FPTAS for the case of a constant 
number of bidders. 



1 Introduction 

Auctions are a popular mechanism for selling and purchasing goods when tra- 
ditional market mechanisms based on supply and demand are not satisfying, or 
are not implementable. In a combinatorial auction, a number of items is sold to 
a group of bidders whose valuation function may not be additive, meaning that 
a valuation of a bundle of items may express relations between subsets of items. 

Mechanisms dealing with combinatorial auctions face several challenges. For 
instance, if there are k items then each bidder has to submit 2^ bids to fully 
express her valuation function. This exponential growth would make such an 
approach infeasible in practice. An alternative typical approach is to assume 
that bidders have simple preferences that can be expressed compactly. 

Another important issue is computational, namely, deciding on the alloca- 
tion of the items to the bidders. The allocation should maximize an objective 
function of the auctioneer, which is usually either the auctioneer’s revenue or 
the economic efficiency. Finding an optimal allocation is computationally hard 
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in general, although it is tractable in certain cases [14,15,17]. There are vari- 
ous methods to tackle this difficulty, such as finding an approximate allocation 
rather than the optimal one [11,6] or developing mechanisms which work well in 
practice, though do not necessarily have a formal guarantee [6,10,14,16]. 

Lehmann, Lehmann and Nisan [11] have concentrated on combinatorial auc- 
tions where bidders’ valuations are known to be subadditive. A very natural 
subclass of subadditive valuations is decreasing marginal utilities (also known 
as submodular valuations), where the valuation a bidder gives to an item mono- 
tonically decreases as the set of items he already purchased grows. Formally, 
if V{-) denotes the valuation function of a bidder, then for any two bundles 
S and T such that S Q T and for any item x such that x ^ T, we have 
y(S'U{x}) — y(S') > y(ru{a;}) — y(T). Lehmann et al. [11] presented a greedy 
algorithm for auctions with decreasing marginal utilities, and proved that it is 
a 2-approximation. 

In this paper we concentrate on auctions with budget constraints, which are 
a special case of auctions with decreasing marginal utilities. In such auctions 
bidder i has a budget limit of di, and the valuations are additive as long as the 
limit is not met. Once the limit is reached, the valuation equals the budget limit. 
Namely, the valuation of bidder i for a bundle A is given by min(<ii, ^b')> 

where bij is the bid of bidder i for item j. Art dealers and collectors, for example, 
are likely to have valuations with budget constraints, since they valuate each item 
separately, but have a limit on their total expenses. 

Bidding in auctions with budget constraints is a rather concise process, since 
each bidder has to submit only k bids, where k is the number of auctioned 
items, and also submit the budget limit. Each bidder is charged according to the 
valuation of the items allocated to her. Our goal is to maximize the revenue of 
the auctioneer. 

We prove that finding an optimal allocation that maximizes the revenue is 
NP-hard even if there are only two bidders with identical valuations. We show 
that an exact solution can be found in time 0(min(n4^, fc^4^ -I- nk)), where k 
denotes the number of items and n denotes the number of bidders. If the number 
of bidders is constant, there is also a pseudo-polynomial algorithm. 

Our main results are polynomial time approximation algorithms. We present 
a randomized algorithm which is a « 1.582 approximation, and also deran- 
domize it. We then exhibit improved approximation ratios when all bidders have 
the same budget constraint (but possibly different valuations), and prove that 
the approximation ratio is between 1.3837 and 1.3951. We also present a FP- 
TAS for the case of a constant number of bidders. (We remark that the greedy 
allocation algorithm of [11] for auctions with budget constraints, even if there 
are only two bidders, has approximation ratio 2.) 

An issue that we have not covered in this paper is the effect of the pricing 
scheme on the bidding strategies of the bidders. Bidders may lie about their 
valuations if it suits their personal interest, and therefore there is an incentive 
to search for a pricing mechanism that will induce truthfulness, i.e. reporting the 
true valuation is a dominant strategy. Designing truthful mechanisms that use 
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approximate allocations is a challenging task, and was successfully accomplished 
under certain assumptions on the bidders’ valuations and the auctioned items 
[1,2,4,5,7,13]. In contrast, we concentrate only on maximizing the auctioneer’s 
revenue given the bids of the bidders. This approach is reasonable if the bidders 
are indifferent to the allocation mechanism, as long as the pricing is according 
to their revealed bids. 

The rest of this paper is organized as follows: Section 2 formally defines the 
auction with budget constraints problem. Section 3 presents algorithms for exact 
solutions and hardness results, and Section 4 presents algorithms for approximate 
solutions. 



2 Model and Notations 



An auction with budget constraints consists of n bidders and k items. Let bij 
denote the bid of bidder i for item j, which is the maximal price that the bidder 
is willing to pay for this item, assuming that the budget constraint is not met. 
Let di denote the budget constraint of bidder i. 

Let Zij G {0, 1} denote the allocation of the items, where Zij = 1 if bidder i 
receives item j and Zij = 0 otherwise. Given an allocation, the price that each 
bidder is willing to pay is Pi = minjdi, ^ijhj}- The objective of allocation 

with budget constraints is maximizing the total payment of all bidders. 

This allocation problem can be presented formally as the following Integer 
Programming (IP) problem: 



Ina,^J2i=lP^ s.t. 

Pi < AjAy l<i<n 
Pi < di 1 < i < n 

ELi < 1 1 < J < fc 

Zij G {0, 1} I < J < fc, I < 



/★maximizing revenue * / 
/★additive valuations ★/ 

/★budget constraints ★/ 

/★one copy of each item^/ 
i < n/^integral allocation ★/ 



( 1 ) 



Without loss of generality, we assume that the budget constraint is consistent 
with the bids, i.e. di > bij, 1 < * < n, 1 < j < fc, and that the budget 
constraint is effective, i.e. di < z2j=i ^ij^ Vi, 1 < f < n. We also assume that all 
bids and budget constraints are non negative. 



3 Exact Solutions 

Lehmann, Lehmann and Nisan [11] have analyzed auctions where the bidders 
have submodular valuations, meaning that the marginal utility that each bidder 
gains for any item decreases as the set of items already allocated to this bidder 
increases. They prove that finding an optimal allocation is NP-hard even if there 
are only two bidders with additive valuations up to budget constraints. The 
following theorem (the proof is based on a reduction from PARTITION [9]), 
strengthens this result. 
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Theorem 1. Finding the optimal allocation for an auction with budget con- 
straints is NP-hard even for two bidders with identical bids and budget con- 
straints. 

Using dynamic programming [3], an exact solution can be found in a time 
complexity that is exponential in the number of items. In the i-th stage of the 
dynamic programming, optimal allocations of any subset of the k items are 
computed over the first i bidders, by using the optimal allocations from the 
previous stage. This process yields the following: 

Theorem 2. An exact optimal allocation for an auction with budget constraints 
can be found in time complexity ofO{n4^) 

For n » k, since in an optimal allocation each item is sold only to one 
of the k highest bidders, the time complexity can be reduced to 0{k^4f + nk). 
When the number of bidders is a constant, then a pseudo-polynomial algorithm 
based on dynamic programming exists. The details are omitted due to space 
constraints. 

4 Approximate Solutions 

In this section we present our main results, which include approximation algo- 
rithms for the case that the number of bidders is constant and for the general 
case. We also present improved bounds when all bidders have the same budget 
constraint, yet possibly different valuations. 



4.1 Constant Number of Bidders 

If the number of bidders is constant then there exists a fully polynomial time 
approximation scheme (FPTAS), meaning that the approximate allocation is at 
least 1 — e (for any e > 0) times the optimal allocation, and the running time is 
polynomial in the number of items k and in ^ . (The algorithm is an adaptation 
of the FPTAS for the scheduling problem with unrelated machines of Horowitz 
and Sahni [8].) 

The algorithm uses sets of tuples {vi,ai,V 2 ,a 2 , ■ ■ ■ ,v„,a„,t) to construct the 
approximation. Each tuple represents an allocation of a subsets of items to the 
users. Let Vi be the benefit of bidder i from the items allocated to her, and let at 
be a bit- vector indicating which items were allocated to this bidder. Let t be the 
total benefit of the partial allocation. The set Sj contains tuples representing 
allocations of the first j items. 

Algorithm Dynamic Programming Allocation (DPA) 

1. Let 7 = max{di}. 

2. Divide the segment [0,ri7] into ^ equal intervals of length ^ each. 

3. Initialize S'o = (0, 0, . . . , 0). 

4. For each item j. 
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a) Construct Sj from Sj-i, by replacing each tuple s with n tuples 
si, . . . , Sn, where tuple Si represents the same allocation as s, with item 
j allocated to bidder i. 

b) For any tuple s = (ui, ai, . . . , Un, o„, t), if there is a tuple s' = 

(v'l, a'l, . . . ,v'„, such that Vi, and v' are in the same interval for 

every i, and t' < t, remove s' from Sj. 

5. Return the allocation represented by the tuple from Sk with the largest total 
benefit. 

Analyzing Algorithm DPA yields the following (proof omitted): 

Theorem 3. Algorithm DPA is a FPTAS for the auction with budget con- 
straints problem with a constant number of bidders. 

Since DPA requires space for representing O tuples, its space com- 

plexity is polynomial in A We note that there exists an approximation algorithm 
with space complexity polynomial in log ^ , which is the space required for repre- 
senting the value e. The approximation, however, is a PTAS, but not a FPTAS, 
i.e. the running time is not polynomial in The description of this algorithm 
is omitted due to space considerations. 

4.2 General Number of Bidders 

In this section we analyze an algorithm with a provable approximation ratio of 
1.582 for an arbitrary number of bidders. 

In order to find an approximation to this allocation problem, we use the 
following Linear Programming (LP), which solves a relaxed version of the original 
Integer Programming (1): 

maxX:^=lEi=la;^iAi s.t. 

^ . j — 1 a^ijbij Si di 

Xij <1 l<j<k 

Xij >0 1 < J < A:, 1 < z < n 

The relaxation replaces the original Boolean variables Zij by variables Xij, 
indicating a fractional assignment of items. The equations are simplified by re- 
moving the Pi variables, which become redundant in the LP. As is the case with 
any relaxation method, our main task is to round the fractional assignments to 
an integral solution. 

Algorithm Random Rounding (RR) is an approximation algorithm for 
the auction with budget constraints problem with a variable number of bidders. 

Algorithm RR 

1. Find an optimal fractional allocation using LP. 

2. For each item j, assign it to bidder z with probability Xij. 
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Obviously, Algorithm RR is randomized, outputs a feasible allocation, and 
terminates in polynomial time complexity. The following theorem proves the 
approximation ratio of the algorithm. 

Theorem 4. The expected approximation ratio of Algorithm RR is at most 
« 1.582. 

e— 1 

Proof. Let be a random variable indicating the revenue from bidder i after the 
allocation. The expected total revenue is ’Yhi^iZi). Therefore, it is sufficient to 
prove that for any bidder, the expected ratio between the benefit of the fractional 
allocation and the final integer allocation is at most . 

We analyze separately the expected revenue from each bidder i. Without 
loss of generality, when considering bidder i we normalize the bids and the 
budget constraint such that di = 1, in order to simplify the calculations. Let 
Bi = Xijbij indicate the revenue from bidder i generated by the fractional 
assignment. We prove that E{Zi) > ^^Bi. Without loss of generality, we as- 
sume that the indices of the items assigned (fully or fractionally) to bidder i are 
l,...,r. 

Let Xij be a random variable, which indicates whether item j is allocated 
to bidder i. The random variable Xij is 1 with probability Xij and 0 otherwise. 
The expected revenue from bidder i is therefore Zi = min(l,^^^j^ Xijbij). 

Suppose we replace bn and xn with bn = 1 and xn = bnXn. The size of the 
fractional item assigned by the Linear Programming is bnXn = meaning 

that the benefit of bidder i in the fractional assignment remains unchanged. We 
observe the effect of replacing Xn with the corresponding Xn on Zi, when the 
remaining variables are kept constant. We denote Zn = min(l, 
observe that Zi — Zn is a random variable that denotes the marginal contribution 
of Xn to the total revenue. We examine how replacing Xn with Xn effects the 
possible values of Zi — Zn. 

1. If 0 < Zn < f — bn: The marginal contribution of Xn is either 0 or bn, so 
the expected contribution is bnXn- The contribution of Xn is either 0 or 
1 ~ Zn < 1, so the expected marginal contribution is at most xn = bnXn 

2. If 1 — bn < Zn < L The marginal contribution of both variables is either 0 
or 1 — Zn. The expected marginal contribution of Xn is (1 — Zn)xn. The 
expected marginal contribution of Xn is (1 — Zn)xnbn < (1 — Zn)xn- 

In both cases, by replacing Xn with Xn we can only decrease E{Zi), without 
changing Bi. Similarly, for each 2 < j < r we replace bij with bij = 1, Xij with 
Xij = bijXij and Xij with Xij. Since each replacement does not increase E^Zf), 
we have E{Zi) = A(min(l, A^-)) < A(min(l, 

Since for each j, Xij is either 0 or 1, then Zi is also either 0 or 1. Therefore: 



E{Z,) = P{Z, = 1) = 1 - P{Zi = 0) = 1 - n(l - i^J) 



(3) 
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The expectation is minimized when — Xij) is maximized. Under the 

constraint Bi — X)j=i hjXij = the maximum is when all Xij are equal 

to Bi/r. Therefore: 

E{Z,) > 1 - (^1 - ^ > 1 - > S,(l - e-i) (4) 

The last inequality follows since 1 — > x(l — e“^) for x G [0, 1]. Therefore, 

we have: 



Bi ^ Bi ^ e 
~ E{Zi) - ^ 



1.582 



( 5 ) 



Since the approximation holds for the expected revenue of each bidder separately, 
it also holds for the expected total revenue. □ 



The following theorem claims that in the worst case, RR has an approxima- 
tion ratio of-^. 

e— 1 

Theorem 5. The expected approximation ratio of Algorithm RR is at least 
« 1.582. 

e— 1 

Proof. The lower bound of for the approximation ratio is achieved in the fol- 
lowing setting: n-|-l bidders, Aq, Ai, . . . , An, compete on 2n items, /i, / 2 , . . . , l 2 n- 
Bidder Aq has a budget constraint do = n, bids bu = n for items I\, . . . I„, and 
bids bij = 0 for items In+i, ■ ■ ■ hn - The other bidders all have a budget constraint 
of di = n~^ . For 1 < i < n, bidder Ai bids bij = n~^ for item li, bij = n~‘^ for 
item In+i, and bij = 0 for the other items. 

The Linear Programming finds a unique fractional assignment which satisfies 
all budget constraints: for each 1 < i < n, bidder Ai receives item In+i (fully) 
and a fraction of item J^. The remaining fraction of ^ of item /j is given to 
bidder Aq. The revenue from the fractional assignment is n -I- 1. 

Algorithm RR achieves an expected revenue of n(l — (1 — -)") -I- " 
which is approximately n(l — e~^) + 1 for sufficiently large n. □ 

From Theorems 4 and 5 we have the following corollary. 

Corollary 1. the approximation ratio of Algorithm RR is exactly . 



4.3 Derandomized Rounding 

A natural derandomization of Algorithm RR which maintains the approx- 
imation ratio would be to sequentially assign each item such that the expected 
revenue maintains above the expectation. Although calculating exactly the ex- 
pected revenue may be computationally hard, this difficulty can be resolved by 
replacing the exact expected revenue with lower bounds, which are derived by 
techniques similar to those used in the proof of Theorem 4. 
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This section discusses alternative deterministic algorithms for rounding the 
fractional assignments. Among all possibilities to round the fractions in a solution 
to a LP instance, we refer to the rounding with the highest total revenue as an 
optimal rounding. Obviously, an algorithm that returns an optimal rounding has 
an approximation ratio of at most any other rounding algorithm, including RR. 

A convenient method to observe the output of the LP is by constructing a 
bipartite graph G = {N,K,E), where the nodes N and K correspond to the 
bidders and items, respectively, and the edges E indicate that a bidder was 
assigned an item (or a fraction of an item). 

The original allocation problem can be divided into several subproblems, 
where each subproblem consists only bidders and items from the same component 
in G. Solving the Linear Programming separately for each subset of bidders and 
items should return the same allocation. Therefore, we may concentrate on each 
component of G separately. 

The LP includes nk + n+k constraints using nk variables. In the solution of 
the LP, at least nk of the constraints are satisfied with equality, meaning that at 
most n + fc of the Xij variables are non-zero. Each non zero variable Xij matches 
one edge in G, and therefore G contains at most n + k edges. On the other hand, 
since G is connected and has n + k nodes, it must have at least n + k—1 edges. 
Therefore, G is either a tree, or has exactly one cycle. 

The following lemma claims that if G has a cycle, then the optimal solution 
to the LP can be modified, such that one edge will be deleted from G, and 
therefore G will be a tree while maintaining the optimality of the solution (proof 
omitted). 

Lemma 1. There is a node in the polyhedron of the LP which maximizes the 
objective function and induces a graph without cycles, and can he found in poly- 
nomial running time. 

By applying Lemma 1, the fixed graph is a tree. Therefore, out of at most 
n + k constraints in the LP that are not satisfied with equality, exactly n + k — \ 
of them are of type Xij > 0. This means that at most one of the non-trivial 
constraints is not strict: Either there is at most one item which is not fully 
distributed, yet all bidders reach their budget constraint, or there is at most one 
bidder that doesn’t reach its budget constraint, yet all items are fully distributed. 
We have the following observation: 

Observation 1. For each component in G, at most one bidder has not met its 
budget constraint. 

Algorithm Semi Optimal Rounding (SOR), returns an allocation that 
has a total revenue of at least 1 — e times the optimal rounding, for any e > 0. 
The algorithm applies a recursive ‘divide and conquer’ process on the tree 
graph representing the fractional allocations to choose a nearly optimal rounding. 

Algorithm S0R(G, e) 

1. Find an optimal fractional solution using LP. 
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2. Construct a bipartite graph G representing the LP allocation. 

3. For each component Gi in G: 

a) If Gi contains a cycle, convert Gi to a tree by modifying the LP solution. 

b) Apply ROUND(Gi, e) 



We use the following notation in process ROUND: for a tree T and a node 
V, let Ty^i denote the i-th subtree rooted at v. Let denote the tree containing 
the subtree the node v and the edge connecting v to T„ j. When node v 
denotes a bidder, let Ui denote the t-th item node shared by v and bidders in 
Ty^i. Let T~^ denote the same tree as Ty^i, with item Ui replaced with a dummy 
item u~ , which has zero valuations from all bidders. 

Process ROUND (T, e) 

1. If T Includes only one bidder, allocate all the items to this bidder. If there 
are no bidders in T, return a null assignment. 

2. Otherwise, find vertex v G T, which is a center of T. 

3. If V represents an item, for each subtree Ty i recursively compute 
ROUND e) and ROUND (PA, e). Allocate u to a bidder such that 
the total revenue is maximized (explanation follows) . 

4. If V represents a bidder, for each subtree Ty^i recursively compute 
ROUND |) and ROUND (T“., |). Find a combination of the partial 
allocations, whose revenue is at least 1 — f times an optimal combination 
(explanation follows). 

When the central node v represents an item, combining the partial 
allocations of the subtrees is a simple process, since only one subtree 
may receive item v. Formally, we enumerate on w’s neighbors to calculate 
maxj (rOUND(T„+ , e) + ROUND(T„,i, e)) . 

However, if v represents a bidder, the number of combinations is exponential 
in the degree of v, and this is why an approximation is preferred over an exact 
solution. The approximation process is as follows: 

Let r be the degree of node v. Let byi be the bid of bidder v for the item 
shared with bidders in the z-th subtree. Let Gi = ROUND(T„ j, |) be the ap- 
proximated revenue of rounding the z-th subtree when v does not get the z-th 
item it shares. Let Ci = ROUND (T“j, |) be the approximated revenue of round- 
ing the z-th subtree when v gets the z-th item (while the bidders in T~^ share a 
dummy item with no value). We construct the following allocation problem with 
2 bidders: The items are a subset of the original items, reduced to those allocated 
(partially or fully) to v, denoted as /i, / 2 , . . . , A plus r special items /(, . . . , /(. 

Bidder 1 has the same budget constraint as the bidder represented by v, and the 
same bids on Ii, . . . ly Bidder 1 bids 0 on the special items Bidder 2 

has an unbounded budget constraint, and bids Ci for each special item and 
max{0, Ci — Ci} for the original items A. 
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The new allocation problem is a reduction of the original rounding problem. 
Any rounding possibility matches an allocation with the same benefit. Therefore, 
if we approximate the reduced allocation solution, we get an approximation to 
the original rounding problem. Since there are only two bidders in the reduced 
problem, we use Algorithm DPA, of Theorem 3, to approximate an optimal 
allocation. 

The following holds for SOR (proof omitted): 

Theorem 6. Algorithm SOR has an approximation ratio of at most + e 
and a polynomial running time for any e > 0. 

By applying both SOR and the sequential rounding discussed at the begin- 
ning of this section we can get rid of the additional factor of e and guarantee an 
approximation ratio of This bound is not tight, as the lower bound of | for 
rounding algorithms (presented in Section 4.5) holds also for this algorithm. 



4.4 Bidders with Identical Budget Constraints 

In this section we show improved approximation bounds for RR in the case 
where all bidders have the same budget constraint. The approximation ratio 
of is due to a bound of 1 — (1 — 1/r)’’ on the expected valuation of each 
bidder, where r is the number of items owned or shared by a bidder. Actually, 
by considering items fully assigned to the same bidder as one large item, we 
achieve a tighter bound of 1 — (1 — where a is the number of partial 

assignments of items to a bidder. When a goes to infinity, the bound goes to 
however not all bidders will have infinitely many fractional items. If all bidders 
have identical budget constraints, we can use this property to derive a tighter 
analysis for RR. 

Assuming the bipartite graph G, constructed from the solution to LP has 
only one component, let G' be the subgraph of G where nodes corresponding to 
items that are allocated only to one bidder are removed. Therefore, G' represents 
only the items shared among several bidders. Let Ni denote the node in G' 
corresponding to bidder i and let Kj denote the node in G' corresponding to 
item j. We define the following sets: 

Definition 1. Let Ra be the bidders corresponding to the set of nodes in G' 
such that {Ni\Deg{Ni) = a}, and let Sa be the items corresponding to the set of 
nodes in G' such that {Kj\Deg{Kj) = a}. 

The sets Ra and Sa have the following property, which is based on the fact 
that G' is a tree: 

Lemma 2. \Ri\ = 2 + Ea>a(^ ~ 2)l^a| + Ea>s(« “ 2)|S'a| 

According to Lemma 2, the number of bidders who have only one fraction of 
an item is fairly large: There are two of these bidders to begin with. Each bidder 
that has more than two fractions of items enforces another bidder in for 
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each extra fraction. Also, Each item shared between three or more bidders adds 
another bidder to R\ for each share over the second. We can use this property 
to prove the following: 



Theorem 7. When all bidders have the same budget constraint, the approxima- 
tion ratio of algorithm RR is at most « 1.421. 



Proof. Without loss of generality assume that the identical budget constraint 
is 1, and that G" is a tree. For each bidder from Ra {a > 2) we match a — 2 
bidders from Ri. By Lemma 2 this is possible, leaving at least 2 bidders from 
Ri unmatched. 

We first assume all bidders reach their budget constraint. Bidders from i ?2 
are unmatched, and have an expected valuation of at least 1 — (1 — 1/3)^ = 
P each. Bidders from set Ra have an expected valuation of at least 1 — 
(1 — l/(a + 1))^“~'’^\ which is less than for a > 3, but they are matched 
with a — 2 bidders from Ri who have an expected valuation of at least 
1 — (1 — 1/2)^ = 3/4, each. On average, the expected valuation is larger than 

for any a > 3. 

If not all bidders reach their budget constraint, by Observation 1 only one 
bidder u has not met the constraint. If u belongs to Ra, then the expected valu- 
ation of u is still at least 1 — (1 — l/(a -I- times the valuation achieved by 

the fractional assignment. If bidder u participates in a match, the ratio between 
the total expected valuation of the bidders in the match and the fractional val- 
uation will remain above ^ as long as u does not belong to i?i. If u G Ri it is 
possible to replace u with an unmatched bidder from i?i, since by Lemma 2 at 
least 2 bidders that are in Ri are unmatched. 

For each group of matched bidders, the ratio between the total expected 
valuation to the fractional valuation is at least Unmatched bidders remain 
only in Ri or i ?2 and therefore also have an expected valuation of at least || 
times the fractional valuation. Therefore, the approximation ratio is at most 
f| « 1.421. □ 



Theorem 7 implies that bidders from i ?2 are the bottleneck of the analysis, as 
they are not matched with bidders from R\. If this bottleneck can be resolved, 
the approximation ratio could drop to 1.3951, which is induced by bidders from 
i? 3 , who are matched with bidders from R\, therefore their average expectation 
is 5(1 + ~ 0.7168 = (1.3951)”^. By using the sets Sa the following theorem 

claims that this improvement is indeed achievable. 



Theorem 8. When all bidders have the same budget constraint, the approxima- 
tion ratio of algorithm RR is at most 1.3951. 

The following lower bound nearly matches the upper bound: 

Theorem 9. The approximation ratio of Algorithm RR with identical budget 
constraints is at least 1.3837 
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4.5 Fractional Versus Integral Allocations 

In this section we derive a general lower bound for algorithms that are based on 
solving the LP and rounding the fractional assignments. The following theorem 
proves a lower bound for any algorithm that uses the relaxed LP. 

Theorem 10. LP has an integrality ratio of at least |. Also, the optimal so- 
lution of the IP can be | times any solution that is based on rounding nonzero 
fractional allocations of the LP. 

Proof. The integrality ratio if | is achieved in the following case: Observe the 
following auction with 2 bidders, A and B, and 3 items x, y and z. Bidder A 
bids 1 for x, 0 for y and 2 for z and has a budget constraint of 2. Bidder B 
bids 0 for x, 1 for y and 2 for z and has a budget constraint of 2. Optimally, 
A get X, B gets y and either bidder gets z, and the revenue is 3. However, the 
LP produces an optimal fractional assignment, which divides z between both 
bidders, and achieves a revenue of 4. 

The ratio of | between the optimal integral solution and any other solution 
that is based on rounding the LP solution is achieved in a similar auction, but 
now both A and B bid 1 for x and y and bid 2 for z (the budget constraints 
remain 2). Optimally, A gets both x and y while B gets z, and the revenue is 4. 
However, the LP might produce a fractional assignment, such as x for A, y for 
B, and z divided between both bidders. The revenue is also 4, but any rounding 
technique will either grant z to A or to H, either way the revenue is 3. □ 
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Abstract. In this paper, we investigate the test set problem and its 
variations that appear in a variety of applications. In general, we are 
given a universe of objects to be “distinguished” by a family of “tests” , 
and we want to find the smallest sufficient collection of tests. In the 
simplest version, a test is a subset of the universe and two objects are 
distinguished by our collection if one test contains exactly one of them. 
Variations allow tests to be multi-valued functions or unions of “basic” 
tests, and different notions of the term distinguished. An important ver- 
sion of this problem that has applications in DNA sequence analysis has 
the universe consisting of strings over a small alphabet and tests that 
are detecting presence (or absence) of a substring. For most versions of 
the problem, including the latter, we establish matching lower and upper 
bounds on approximation ratio. When tests can be formed as unions of 
basic tests, we show that the problem is as hard as the graph coloring 
problem. 



1 Introduction and Motivation 

One of the test set problems was on the classic list of NP-complete problems 
given by Garey and Johnson [6]; these problems arise naturally in many other 
applications. Below we provide an informal description of the basic problem 
with its motivating applications in various settings; precise descriptions and 
definitions appear in Section 1.1. In every version of the test set problem, we 
are given a universe of objects, family of subsets (tests) of the universe and a 
notion of distinguishability of pairs of elements of the universe by a collection of 
these tests. Our goal is to select a subset of these tests of minimum size that 
distinguishes every pair of elements of the universe. This framework captures 
problems in several areas in bioinformatics and biological modeling. 

Minimum Test Collection Problem: This problem has applications in di- 
agnostic testing. Here a collection of tests distinguishes two objects if a test 
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from the collection contains exactly one of them. Garey and Johnson [6, 
pp. 71] showed a proof of NP-hardness of this problem via a reduction from 
the 3-dimensional matching problem. Moret and Shairo [12] discussed some 
heuristics and experimental results for this problem. Finally, very recently 
the authors in [2,8] established a (1 — e) Inn lower bound for approximation 
for any polynomial-time algorithm under standard complexity-theoretic as- 
sumptions where n is the number of objects and e > 0 is an arbitrary con- 
stant. 

Condition Cover Problem: Karp et al. [10] considered a problem of verifying 
a multi-output feedforward Boolean circuit as a model of biological pathways. 
This problem can be phrased like the Minimum Test Collection Problem, 
except that two elements are distinguished by a collection of tests if one 
tests contains exactly one of them, and another contains both or none of 
them. 

String Barcoding Problem: In this problem, discussed by Rash and Gus- 
field [13], the universe U consists of sequences (strings), and for every pos- 
sible string V we can form a test T„ as a collection of strings from U in 
which V appears. The name “string barcoding” derives from the fact that 
the Boolean vector indicating the occurrence (as a substring) of the tests 
from an arbitrary collection of tests in a given input sequence is referred to 
as the “barcode” of the given sequence with respect to this collection of tests. 
Motivations for investigating these problems come from several sources such 
as: (a) database compression and fast database search for DNA sequences 
and (b) DNA microarray designs for efficient virus identification in which the 
immobilized DNA sequences at an array element are from a set of barcodes. 
In [13], Rash and Gusfield left open the exact complexity and approxima- 
bility of String Barcoding. We also consider a version in which a test can 
be defined by a set T of strings, with some limit on the set size, and u G U 
passes test T if one of strings in T is a substring of u; such tests are as 
feasible in practice as the one-string tests. 

Minimum Cost Probe Set Problem with a Threshold: This problem is 
very similar to String Barcoding and it was considered by Borneman et al. [3] . 
They used this in [3] for minimizing the number of oligonucleotide probes 
needed for analyzing populations of ribosomal RNA gene (rDNA) clones by 
hybridization experiments on DNA microarrays. Borneman et al. [3] noted 
that this problem was NP-complete assuming that the lengths of the se- 
quences in the prespecified set were unrestricted, but no other nontrivial 
theoretical results are known. 



1.1 Notation and Definitions 

Each problem discussed in this paper is obtained by fixing parameters in our 
general test set problem TS^{k). The following notation and terminology is used 
throughout this paper: 

— [i,j] denotes the set of integers {i,i + 1, . . . , j — 1, j}. 
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— V{S) = {A ■. ACS'} denotes the power set of S. 

— I A" I denote the cardinality (resp. length) of AT if AT is a set (resp. sequence). 

— For two sequences (strings) u and v over an alphabet FI, u is a substring of 
X (denoted by v ^ x) if a: = uvw for some u,w € S*. 

— For two sets of numbers A and B and a number a, a x A denotes the set 
{ai\ i G A} and A + B denotes the set {a + 6| a G ASzb G B}. 



Definition 1. (Problem TS^(fc) with parameters B C P([0,2]) and a 
positive integer k) 

Instance: (n, S) where S C P([0,n— 1]). 

Terminologies : 

— A k-test is a union of at most k sets from S. 

— For a 7 G r and two distinct elements x,y G [0,n — 1], a k-test 
T 7-distinguishes x and y if |{x, yj fl T| G 7. 

Valid solutions: A collection T of fc-tests such that 

(Vx, y G [0, n — 1] V7 G F) x yf y 3T G T such that 

T 7-distinguishes X and y. 



Objective: minimize |T|. 



An example to illustrate Definition 1: Let n = 3, k = 1, B = { {1} } and 
5 = { {0}, {!}, {0, 1} }. Then, T = { {0},{0, 1} } is a valid solution since the 
1-test {0, 1} {Ij-distinguishes 0 from 2 as well as 1 from 2 while the 1-test {0} 
{l}-distinguishes 0 from 1. 

Now we precisely state the relationship of the TS^(fc) problem to several 
other problems in bioinformatics and biological modeling that we discussed be- 
fore: 



Minimum Test Collection Problem (Garey and Johnson [6]): This is 
precisely TS^^^ (1). 

Condition Cover Problem (Karp et al. [10]): Assuming that the allowed 
perturbations are given as part of the input, this problem is identical to 

TS{i}’{°’2}(1). 

String Barcoding Problem: Define a k-sequence as a collection of at most k 
distinct sequences. In this problem, considered by Rash and Gusfield [13] for 
the case when fc = 1, we are given a set S of sequences over some alphabet 
E. For a fixed set of m /c-sequences t = (to, ... , t^-i), the barcode code{s, t) 
for each s G S' is defined to be the Boolean vector (cq, ci, Cm-i) where c, is 1 
iff there exists a t G t^ such that t ^ s. We say that t defines a valid barcode 
if for any two distinct strings s, s' G S, codecs, t) is different from codecs', t). 
The string barcoding problem over alphabet E, denoted by SB^(fc), has a 
parameter fc G N and is defined as follows: 

Instance: (n, S) where S C E* and 1 < fc < n = |S|. 

Valid solutions: a set of fc-sequences t defining a valid barcode. 
Objective: minimize |t|. 

SB^{k) is a special case of TS^^^(fc) in which U = S and for each substring 
p of each sequence in S there is a test {s G 5 : p ^ s|; valid barcodes can 
be identified with valid sets of k-tests. 
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Minimum Cost Probe Set with Threshold r (Borneman et al. [3]): 

This problem, denoted by MCP^(r), is a variation of TS^^^(l). Denote by 
oc{x, y) the number of occurrences of x in y as a substring, For a fixed set of 
m sequences t = (tojfij • ■ • an r-barcode code{s,t) for any sequence 

s is defined to be the vector (cq, ci, . . . , Cm-i) where Ci = min{r, oc{ti, s)}. 
Given a set S of sequences over some alphabet S, t defines a valid r-barcode 
if for any two distinct strings s, s' £ S, code(s, t) is different from code{s', t). 
MCP^(r) is now defined as follows: 

Instance: {n,r,S,V) where S,V (Z S* and |5| = n. 

Valid solutions: a set of sequences t £ V* defining a valid r-barcode. 
Objective: minimize |t|. 

If V is the set of all substrings of sequences in S, MCP'^(l) is precisely 
SB^(l). All our results on SB-^(l) apply to MCP'^(r) with appropriate 
modifications. 



2 Summary of Our Results 

We provide matching upper and lower bounds on approximation ratios of poly- 
nomial time algorithms for TS^^^(l), SB^(l) and MCP^(r) and 

strong lower bounds on approximation ratios of polynomial time algorithms for 
TS^^^(/c), and SB^(A:) for large k\ these results are summarized 

in Table 1. 



Table 1. Summary of our approximability results: (n, S) is an input instance of TS^(fc) 
and SB^(fc), {n,S,V) is an input instance of MCP^(r), i is the maximum length of 
any sequence in S, L is the total length of all sequences in S and e and 5 are constants. 
The column “Assumptions” contains sufficient condition(s) for the respective lower 
bound. 



Problem 


Approximation Ratio 


Theorem (s) 


Upper Bound 


Lower Bound 


Time 


the bound 


the bound 


Assumptions 


TSbT(T) 


0{n^\S\) 


1 -I- Inn 


(1 — e) Inn 


NP{ZlDTIME(n'°®'°®") 


1 and 5 


TSiB, 10,2} 1^1) 


0(n^|5|) 


1 + In 2 -I- In n 


(1 — e) Inn 


NP!ZlDTIME(n‘°®‘°®") 


1 and 5 


SB^(l) 




1 -I- Inn 


(1 — e) Inn 


NP{ZlDTIME(n‘°®‘°®”) 

|Y|>1 


1 and 5 


MCP^(r) 


+ 

O 


[1 -I- o(l)] Inn 


(1 — e) Inn 


NP 5 ZlDTIME(n‘°®‘°®”) 

|T|>1 


1 and 5 








n^ 


NP#co-RP 
0 < e < (5 < 1 


9 


SB^(n'*) 






n^ 


NP/co-RP 
0 < e < (5 < 1 


9 
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Techniques Used 

(a) Our algorithm to achieve the tight approximation bound in Theorem 1 for 

TS^^^(l), and MCP^(r) is a greedy algorithm that selects tests 

based on information content defined in terms of the change in the partition 
of the universe when the test is applied. This notion is directly related to the 
Shannon information complexity [1,14]. A careful analysis yields an upper bound 
on the approximation ratio that matches the lower bound in Theorem 5 within 
a small additive term. We believe the analysis will be useful in the context of 
analyzing other problems involving recursive partitioning of a given universe as 
well. 

(b) The inapproximability results of Theorem 5 are proved by approximation 
preserving reductions from the set cover problem. To handle the barcode problem 
for S = {0, 1} we introduce an artificial intermediate problem (the “test set with 
order” problem) in which some tests are provided almost for free but they help 
very little in constructing a good set of tests. This roughly corresponds to the 
fact that we cannot avoid tests that do not correspond to sets in the original set 
cover instance, but we can make them cheap. 

(c) The inapproximability results in Theorem 9 are obtained by approximation 
preserving reductions from the graph coloring problem. 

Conparison of our results with those iu [8,2]: The authors in [8,2] proved 
a (1 — e) Inn lower bound for approximation for TS^(l). In this paper, we prove 
a lower bound of (1 — e)lnn for an extremely restricted special case of 

TS^(l) that is of utmost importance to the bioinformatics community in detect- 
ing unknown virus sequences and designing probes for DNA microarrays. The 
proof in [8,2] from set-cover to TS^(l) does not seem to be easily transformable to 
provide a lower bound for -with a similar quality of non-approximability 

because of the special nature of SB^°4}^ therefore needed to introduce an 
artificial intermediate problem (the “test set with order” problem, denoted by 
TSO^) which we could then translate to SB° 4 in a non-trivial manner. It should 
be noted that, for general k, TSO^ is neither equivalent to or nor a special case 
of TSi(l). 

Notatioual simplificatious: We will skip (1) in TS{i>(l), TS{i>’{°4}(i) and 
SB{°4}(!)^ write “{l}-distinguishes” simply as “distinguishes” or “separates”, 
and 1-tests simply as tests. Also, unless otherwise stated, all “computations”, 
“transformations” or “reductions” take polynomial time. 

The Map. Proofs of some of the claims in Theorems 1, 5 and 9 appear in 
Sections 3 , 4 ,and 5, respectively. 

3 Approximation Algorithms for Test Set and Minimum 
Cost Probe Problems 

The Set Cover (SC) Problem is defined on an input instance (U,S) such that 
S C P(C/) with the goal of finding a, C C S such that U^eC ^1 = U and jC] is 
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minimized. We can translate the problem to SC as follows. Given instance 

{n,S) of we define instance {U,t{S)) where C/ = {e C [0, n — 1] : |e| = 

2}, r(T) = {e G C : |e n T| = 1}, and r(5') = {r(T) : T G 5}. The 

best proven approximation ratio for SC is achieved by a greedy heuristic [9] 
that, starting from the empty partial set cover, keeps adding new sets to the 
solution that maximize the number of elements that are not covered as yet. This 
heuristic for set cover runs in |r(T)|) time and has an approximation 

ratio of 1 + In (maxrgs |r(T)|). Since maxTeS |t(T)| = \T\ {n — |T|) < 
the above translation offers a 0(n^|5|) time greedy heuristic for TS^^^ with an 
approximation ratio of (21nn)— ln4. A similar reduction for the (resp. 

MCP'^(r)) to the SC problem can also be given providing a greedy heuristic with 
an approximation ratio of (2 Inn) — In | (resp. 2 Inn). The main result of this 
section improves upon that simple heuristic as follows. 

Theorem 1. There is an 0(n^|5|) time approximation algorithm for TS^ with 
approximation ratio 1 + Inn for T = {{!}} and 1 + In 2 + Inn for T = 
{{!}, ]{0, 2}}. There is an 0(n^|P| + L\P\) time approximation algorithm for 
MCP^{r) with approximation ratio l+lnn+lnlog 2 (r'+l), where r' = min{r,n} 
and L is the total length of the sequences in S. 



3.1 Proof of Theorem 1 for 

In this section we provide a greedy heuristic for TS^^^ running in time 0(n^|5|) 
time with an improved approximation ratio of 1 + In n. Notice that the upper 
bound almost matches the lower bound in Theorem 5 for a special case 

of TS{i>. 

First, we consider the problem TS^^^. In the definition below and throughout 
the rest of this section we use T + T to denote T U {T}. 

Definition 2. A set of tests T C S defines the following: 

T T 

— an equivalence relation = on [0, n — 1] given by i = j if and only if VT G 
niGT = j€T), 

— a set of permutations IIj- = {tt G {permutations of [0,n — 1]) : Vi G [0,n — 
1] i = 7r(i)}, 

— entropy Hr = log 2 \Hr\- 

— information content of a T G 5 with respect to T, IC{T,T) = Hr~Hr+T = 

log- l-^nl 

|/7r+T| ■ 

Our definition of entropy is very similar to the one suggested in [12]. Suppose 

r 

that the equivalence relation = on [0, n — 1] produces q equivalence classes of size 
si, S 2 , ■ ■ ■ ,Sq. Then, the entropy suggested in [12] is ^ log 2 {Hf^^s^') whereas our 
entropy Hr is log 2 (iI)LiSi!). 

The information content heuristic (ICH for short) is the following simple 
greedy heuristic: 
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r = 0 

while Hj- ^ 0 do 

select a r G S — T that maximizes IC{T, T) 

T = T+T 

endwhile 

The correctness of ICH follows from the fact that Hj- = 0 implies the equivalence 

7 " 

classes of = are n singleton sets {0}, {!}, . . . , {n — 1} and the fact that if Hj- yf 0 
then there exists a T G S—T with IC{T, T) > 0 (otherwise our problem instance 
has no feasible solution) . It is also not very difficult to implement this algorithm 
efficiently within our claimed time bounds. 

T 

To implement ICH, we iteratively maintain the equivalence classes of = as 
sorted lists. We also precompute and store log 2 (f!) for each i G [l,n]. Given a 
specific T G 5 — T, it is easy to compute in 0{n) time the equivalence classes 

T+T T T 

of = from the equivalence classes of = since an equivalence class E of = 

T+T 

is either an equivalence class of = or it is partitioned into two equivalence 

T+T 

classes Ei = E DT and E 2 = E — E\ of = ; the first case contributes nothing 
to IC{T,T) while the second case adds log 2 to IC(T,T). Finally, notice 
that the while loop is executed at most n times. 

Now we analyze the approximation ratio of ICH. We will use the convention 
X = \X\ for a set X. 

Lemma 2. IfToC Ti then IC{T,%) > IC{T,ri). 

Lemma 3. IC{T, 0) < n for every test T. 



Lemma 4. If IC{T,T) > 0 then IC{T,T) > 1. 

Now we are ready for an amortized analysis of ICH. Suppose that an optimum 
solution of (n,S) is T* = {Tf , . . . ,T^}. During the execution of ICH, for a 
current partial test set T, let 7) = T + + • • • + T* (accordingly, % = T) 

and hi = IC{Ti-i,T*). Notice that ~ -f^r._i+T;) = 

H-j- — H-r+r* = H'l, since iJr+T* = 0- Let h* < n denote the initial value of hi 
i.e. the value of hi with T = 0. 

During the j**' iteration of the while loop, ICH selects a test T (with, say, 
IC{T,T) = Aj) and changes T into T + T. As a result, Hj- drops by Aj and 
hi drops by some Sij with X)i=i ~ ^j- This iteration adds 1 to the solution 
cost. We distribute this cost among the elements of T* by charging T* with 
SijjAj. Because hi = IC{Ti~i,T*) < IC{T,T*), we know that Aj > hi since 
otherwise ICH would select T* rather then T. Therefore reducing the current 
hi by 5ij is associated with a charge that is at most 6ij/hi. Let m{h) be the 
supremum of possible sums of charges that some T* may receive starting from 
the time when hi = h. By induction on the number of such positive charges 
we will show that m{h) < 1 + In/i. If this number is 1, then h > 0 and hence 
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In/i > 0 (by Lemma 4), while the charge is at most 1. In the inductive step, we 
consider a situation when T* starts with hi = h, receives a single charge 6/h, 
hi is reduced to h — S and afterwards, by inductive assumption, T* receives at 
most m{h — 6) charges. Because h — 6 > 0 we know by Lemma 4 that h — 6 > 1. 
Therefore 

S S dx /■'* dx 

m{h)<m{h-5) + ^ <l + lTi{h-5)+- <1+ —+ — = l + lnh. 

h h X Jh_s X 

By Lemma h < n. This proves our claim on the approximation ratio for TS^^^ . 

4 Inapproximability Results for Test Set, String 

Barcoding, and Minimum Cost Probe Set Problems 

The NP-hardness of TS^^ follows from the NP-hardness of the minimum test 
collection problem in [6] from a reduction from the 3-dimensional matching 
problem and minor modifications of this reduction can be used to prove the 
NP-hardness of TS^^^4o.2} as well. NP-hardness of MCP^(r) from the vertex 
cover problem was mentioned without a proof in [3] . Our goal is to show that it is 
impossible (under reasonable complexity theoretic assumptions) to approximate 
these problems any better than mentioned in Theorem 1. 

Theorem 5. For any given constant Q < p <1, it is impossible to approximate 
restricted case of T 5 'Ob{o, 2 } (r) within a factor 

of (1 — p) Inn in polynomial time unless NP<zDTIME{n'°^'°^^) . 

Our proof of Theorem 5 proceed in two stages: 

— In Section 4.1 we introduce the Test Set with Order (TSO) problem and pro- 
vide a reduction from the set cover problem to the TSO problem preserving 
apprpximation. 

— Our complete reduction from the set cover problem to SB^*^’^^, described in 
Section 4.2, uses a composition of the abovementioned reduction and another 
approximation-preserving reduction from the TSO problem to SB^®’^^. 



4.1 Test Set with Order 

To make the approximation preserving reduction from set cover to SB^*^’^^ easier 
to follow, we introduce an intermediate problem called Test Set with Order with 
parameter A: G N (denoted by TSO^): 

Instance: (n, k, S) where fc is a positive integer, (n, S) is an instance of 
TS^^^ and S includes the family of “cheap” sets So = {{t}| * G [0,n — 
l]}U{[0,i] I i G [0,n- 1]}. 

Valid solutions: a solution for the instance (n,S) of TS^^^. 

Objective: minimize cost{T) = |T— 5q| -I- p\T n5o|. 
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Note that TSO^ is in fact a special case of hence any hardness results 

proved for TSO^ would apply to as well. Our claim follows once the 

following theorem is proved. 

Theorem 3. For any integer constant k > 0 and any constant 0 < p < 1, it 
is impossible to approximate TSC^ within a factor of {1 — p) Inn in polynomial 
time unless NPcDTIME{n^°^^°^'^). 

In the rest of this section, we prove the above theorem. We need the following 
straightforward extension of the hardness result in [4] for a slightly restricted 
version of SC. 

Fact 6. Assuming NP(fiDTIME{n'"°^^°^^), instances of the SC problem for 
which the optimal cover requires at least (log 2 n)^ sets cannot be approximated 
to within a factor of (1 — s') Inn for any constant s' > 0 in polynomial time. 

For notational simplicity, assume that kn is an exact power of 2 and f = 
log 2 (fcn). The following lemma gives a reduction from SC to TSO^ problem. 

Lemma 7. There exists a polynomial-time computable function t that maps an 
instance (n,S) of SC into instance {2kn, k,T{S)) of TSO^ such that optimal so- 
lutions of{n,S) and (2/cn, fc, r(5)), C* andT* respectively, satisfy the following: 

\C*\ <cost{T*) < |C*|+^+l. 

Moreover, given any solution X of (2fcn, fc, t(5)), we can in polynomial time 
construct a solution Y of {n,S) such that |T| < cost{X). 

Proof. t{S) contains the following sets: 

cover sets: D{S) = 2 x (fc x S' + [0, fc — 1]) for S' G 5; 

cheap sets: {i} and [0,i] for each i G [0,2fcn — 1]; 

other sets: Ai = {j G [0, 2kn — 1] | j mod 2*+^ > 2*} for z G [1, £] . 

First, we show that cost(T*) < |C*|+f. Given a set cover C of (n, S) we define the 
following test set that is a solution of (2kn,T{S)): T = {D{A)\ A G C}U{Aj| i G 
[1, £]}. To see that T is indeed a valid solution, consider i,j G [0, 2fcn— 1]. Suppose 
that i is even and j is not. Then for some A G C and a G 2 x [0, A: — 1] we have 
(z — 2a) /2k G A, and thus z G D{A) while j ^ D{A). On the other hand, if that z 
and j have the same parity then they differ on bit for some k G [1, A], in which 
case i and j are distinguished by test Hence, cost{T*) = \T* \<\C*\ + l 
Next, we show that |C*| < cost{T*). Given a set of tests T, consider the 
partial cover C = {A\ D{A) G T}, and let C = UsgC' Consider i G [0, zz — 
1] — C. For a G [0, A: — 1] we know that some set of T distinguished 2ki — 2a from 
2A:z — 2a+l. This distinguishing set can only be one of the three sets: {2A;z — 2a}, 
{2A:z — 2a + 1} or [0, 2ki — 2a]. Note that for each z G [0, rz — 1] — C and each 
a G [0, A: — 1] we have a choice of different three sets, so in each such case we use 
a different element of T. We can conclude that T contains k{n— jCj) such sets, 
and thus cost{T) > |C'|+n— jCj. Since for each z G [0,rz — 1] T must distinguish 
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2i — 1 from 2i, T must contain one of these three sets: {2i — 1}, {2i}, [0, 2i — 1], 
Note that each i G [0, n — 1] — C has different possibilities, thus for each of them 
T contains a different set of choices. We can therefore extend C' to a cover C of 
{n,S) by adding at most n — \C\ sets. Hence \C\ < cost{T). 

Hence, cost{T*) < \C*\ + \. 

We can now complete the proof of Theorem 3. Consider an instance of SC 
as mentioned in Fact 6, transform it to an instance of TSO^ as described in 
Lemma 7 and let C* and T* be optimal solutions to the instances of SC and 
TSO^, respectively. Suppose that we can approximate TSO^ within a factor of 
(1 — p) Inn and let T' be such an approximate solution. Then, by using Lemma 7 
we can find a solution C to the instance of SC such that 

|C"| < cost{T') 

< (1 — p) lnncost(T*) 

< (1-p) lnn(|C*|+^+l) 

< (1 — p + o(l)) Inn |C*| since |C*| = and ^ = C(logn) 

which violates Fact 6 by choosing e' = 1 — p + o(l). 

4.2 Proof of Theorem 5 for 

As before, for notational simplicity, assume that kn is an exact power of 2 and 
£ = log 2 (kn). First, using the reduction described in the proof of Lemma 7, we 
provide a reduction of SC to SB^°d}^ 

Lemma 8. For any given constant integer fc > 0, there exists a polynomial- 
time computable function a that maps an instance (n, S) of SC into an instance 
{2kn,a{S)) so that if C* and t* are the optimal solutions for (n,5) 

and (2kn,a{S)), respectively, then 

< |t*| < \C*\+t 

^ + fc 

Moreover, given any solution x of (2kn,a{S)), we can in polynomial time con- 
struct a solution Y of{n,S) such that < |x|. 

Proof. First, we define a family t{S) of subsets of [0, 2kn— 1] using the function 
T from Lemma 7. Let 5o be the family of “special” or “cheap” test sets, and 
5i = t{S) — So- We number the elements of 5i, so 5i = {Bq, . . . , H^-i} and 
let Bm = [0, 2kn — 1] G So- For each i G [0, 2kn — 1] we define sequence Si as a 
concatenation of alternating groups of 0*^^ and a distinct member from the set 
I i G Bk}, begining and ending with This completes the description 
of the function a. 

Consider any set cover C of (n, 5). As noted in the proof of Lemma 7, we can 
map it into a solution for TSO^ without using any cheap tests and with at most 
\C* \ -\- £ test sets. Then, we replace test Bj with a test sequence Ol-’+^O. Thus 
|t*| < \C*\+£. 
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Now consider a solution vector of sequences t for cr(5). We show how to 
replace each sequence t of t with at most two sets such that the following two 
statements hold: 

(a) if {t A Sp) yf (t ^ Sq) for two sequences Sp and Sq, then the replaced sets 
{l}-distinguish p from q; 

(b) when we use two sets, one of them is cheap. 

By (a), the replacement sets form a solution for the instance (2kn,k,T{S)) 
of TSO^. By (b), the cost of the this solution for (2kn,k,T{S)) is at most 
(l + |;) |t|. Finally, by Lemma 7, it is possible to construct from this solution 
for (2kn,k,T{S)) a solution for the set cover instance (n, 5) with no more than 
(l + |t| sets. Hence, it only remains to show the replacement. We have the 

following cases: 

Case 1: t contains a substring 10“1 for some a > 0. Then t can be a substring 

of only Sa-i, so we can replace t with a cheap test {a — 1}. 

Case 2: Otherwise, t is of the form 0*1*0*. 

Case 2.1: t = 0“ for some a > 0. Then t is a substring of all s^’s with 

i > a — 1, and therefore we can replace it with a cheap test [0, i — 2], 

Case 2.2: t = 0“1^ for some a, & > 0. If 6 > m + 1, t is not a substring of 

any Si, so we can discard it. If & < to + 1, then this test is equivalent to 

0“ because every Si contains 1™+^. 

Case 2.3: t = 1“0^ for some a,b > 0. Similar to Case 2.2. 

Case 2.4: t = 0“1^0° where a, b, c > 0. Let d = max{a, c}; one can see that 
we can replace t with Bt-i and [0, d — 2], 

We can now complete the proof of our claim in a manner similar to that 
in the proof of Lemma 7. Consider an instance of SC as mentioned in Fact 6, 
transform it to an instance of SB^°dI described in Lemma 8 and let C* and 
t* be optimal solutions to the instances of SC and SB^°’^^, respectively. Suppose 
that we can approximate SB^°’^^ within a factor of (1 — p) In n and let t' be such 
an approximate solution. Then, by using Lemma 8 we can find a solution C to 
the instance of SC such that 
\C'\ < (1 + i) cost(t') 

< C + — p) hincost(t*) 

< (l+|)(l-p) lnn(|C*|+£+l) 

< (1 — p + o(l)) Inn |C*| since |C*| = and £ = l7(logn) 

which violates Fact 6 by choosing e' = 1 — p + o(l). 

5 Stronger Inapproximabilities for TS^^^(fc), 
and 

Theorem 9. 

(a) For any two given constants 0 < p < 6 < 1, and 

cannot be approximated to within a factor of in polynomial time unless co- 
RP=NP. 

(b) The result in (a) also holds for SB^{n^) ift)<p<5<\. 
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Abstract. Consider a graph problem which is associated with a param- 
eter, for example, that of finding a longest tour spanning k vertices. The 
following question is natural: Is there a small subgraph which contains 
optimal or near optimal solution for every possible value of the given pa- 
rameter? Such a subgraph is said to be robust. In this paper we consider 
the problems of finding heavy paths and heavy trees of k edges. In these 
two cases we prove surprising bounds on the size of a robust subgraph 
for a variety of approximation ratios. For both problems we show that 
in every complete weighted graph on n vertices there exists a subgraph 
with approximately edges which contains an a-approximate so- 

lution for every fc = l,...,n— 1. In the analysis of the tree problem we 
also describe a new result regarding balanced decomposition of trees. In 
addition, we consider variations in which the subgraph itself is restricted 
to be a path or a tree. For these problems we describe polynomial time 
algorithms and corresponding proofs of negative results. 



1 Introduction 

Consider an optimization problem that requires to find, for a given vertex- 
weighted or edge- weighted graph G = (V,E), a subgraph H of minimum or 
maximum weight which meets several design criteria. In many such problems 
one of these criteria is specified by a parameter k, which often expresses a bound 
on the size of the subgraph. Some extensively studied problems of this type are 
the /c-CENTER and fc-MEDiAN problems, in which H is a collection of k stars 
spanning V (see, for example, [MF90]); fc-MST, in which H is a tree on k ver- 
tices [AKOO]; /c-TSP, in which H is a simple cycle on k vertices [G96]; optimal 
DISPERSION, in which H is a clique of k vertices [FKP01,HRT97], and many 
more. 

When studying a problem of this type, a fundamental question is the ex- 
istence of a small-sized robust subgraph, that is, a subgraph which contains 
optimal or near optimal solution for every possible value of the parameter k. 
This subgraph can be viewed as a compact data structure: Whenever a value 
of k is specified, we will extract the solution H from this subgraph. In prob- 
lems where a robust subgraph exists, we will also be interested in finding and 
maintaining such a subgraph by applying low complexity algorithms. 

In this paper we address two of the most basic problems, those of finding 
small robust subgraphs which contain heavy paths and heavy trees of any size 
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from 1 to \V\. In these two cases we prove surprising bounds on the size of a 
robust subgraph for a variety of approximation ratios. In the analysis of the 
tree problem we also describe a new result regarding balanced decomposition 
of trees. This result is of independent interest, and we hope that it might have 
applications other than the one in this paper. 

We also consider variations in which the subgraph itself is restricted to be a 
path or a tree. We describe polynomial time algorithms for the problems of find- 
ing a path-robust path and a tree-robust tree. These algorithms constructively 
prove several existence theorems, and are accompanied by corresponding proofs 
of negative results. In addition, we present a polynomial time algorithm which 
finds a tree-robust subgraph, which is near optimal with respect to cardinality 
and total weight simultaneously. 



1.1 Basic Definitions 

Let G = (V,E) he a, complete graph with vertex set V = {1, . . . , n} and non- 
negative weights w{e), e £ E. For a subgraph C G we denote \H\ = \E{H)\ 
and w{H) = ^(^)- ^ of edges E' C E we denote by G[E'] the 

subgraph of G whose vertices are the endpoints of edges in E' and whose edges 
are E' . 

For 1 < fc < n — 1 let and be a maximum weight fc-edge path and 
a maximum weight k-edge tree in G, respectively. A subgraph H Q G is called 
a-path-robust if for every 1 < fc < n — 1, there exists a path P Q H, such that 
|P| < k and w{P) > aw(P^). El is called a-tree-robust if for every 1 < k < n—1 
there exists a tree T C El, such that |T| < k and w{T) > aw{T^). 

For 0 < a < 1, let p°‘{G,w) and G(G,w) be the minimum number of edges 
in an a-path-robust subgraph and an a-tree-robust subgraph of G, respectively. 
We define 



P 



a 

n 



max 

{G,w):\ViG)\=u 



P^{G,w) 



max t°‘(G,w) . 

{G,w):\V{G)\=n 



1.2 Results and Techniques 

As a first attempt of attacking the problems of finding small a-path-robust and 
a-tree-robust subgraphs, one might consider using a maximum weight Hamilto- 
nian path and a maximum spanning tree, respectively. However, simple examples 
illustrate that there are graphs in which these subgraphs are not a-tree-robust 
or a-path-robust for any a > 0. 

In Section 2 we show that for a < I, limsup„_,.go ^ which follows 

from an upper bound on the number of edges in a minimum a-path-robust 
subgraph. A general overview of the method we use to obtain this bound is the 
following: 

1. LetP = {P*,...,P„*_J. 

2. We first define a collection of subsets of V, {Pi}ig/, such that Uig/Pi = V. 

These subsets are not necessarily disjoint. 




Robust Subgraphs for Trees and Paths 



53 



3. For each z G /, we independently find a subgraph Hi C G which ct-covers Vi, 
that is, for every P £ Vi there exists a path P' C Hi, such that \P'\ < |P| 
and w(P') > aw{P). 

4. Let H = Ui^jHi, then \H\ can be used as an upper bound on the number of 
edges in a minimum a-path-robust subgraph. 

In Section 3 we prove a corresponding bound of lim sup„_,.g^ ^ for 

the minimum a-tree-robust subgraph problem. As before, we define a collection 
of subsets {7i}ig/ of T = {T*, . . . ,T*_i} such that Ui^iTi = T. For every i £ I 
an a-cover Hi C G is found for 7i, and \H\ is used as an upper bound on the 
number of edges in a minimum a-tree-robust subgraph, where H = Uig/iLi. 

However, there are two major differences between the analysis in this case 
and the analysis of the minimum a-path-robust subgraph. One difference is that 
throughout our analysis we consider only rational values of a. The result we ob- 
tain is then extended to arbitrary values of a using a simple analytical argument. 
Another difference is that the method we apply to find the subgraphs Hi, which 
Qf-covers the subsets of trees we define, is much more involved. The existence of 
small enough subgraphs Hi is proved using tree decomposition schemes. 

Definition 1. A tree decomposition scheme is a triple {r,p, c), r £ N, ^ < p < 1, 
c G N, such that for every rz-edge tree T there exist Ei,. .. ,Ej. C E{T) that 
satisfy the following conditions: 

1. {El, . . . ,Er} is a partition of E{T). 

2- \Ei\ < [pn\ + 1, for every 1 < z < r. 

3. The combined number of connected components in T[Ei], . . . ,T[Er] is at 
most c. 

In Section 3 we also prove that for every r G N and e > 0 there exists a tree 
decomposition scheme {r,p, c) which satisfies p < ^ + e and c = O (r log ^). We 
also present a polynomial time algorithm which finds such decomposition. 

In Section 4 we first present a polynomial time algorithm which finds a 
path-robust path. We also provide an example for a graph in which no path 
is a-path-robust for a. > 0.843. We then describe a polynomial time algorithm 
which finds a |-tree-robust tree, and provide an example for a graph in which 
no tree is a-tree-robust for a > 0.866. Finally, we present a polynomial time 
algorithm which finds a ^-tree-robust subgraph H, such that \H\ is at most 
twice the number of edges in a minimum cardinality ^-tree-robust subgraph, and 
w{H) is at most twice the weight of a minimum weight ^-tree-robust subgraph. 

2 cc-Path-Robust Subgraphs 

Before we describe how our general method is applied to obtain the bound on 
the number of edges in a minimum a-path-robust subgraph, we first present two 
helpful lemmas, on which the following discussion will be based. 

Lemma 1 . Let 0 < a < 1 and k > Then ) > aw{P^). 




54 



R. Hassin and D. Segev 



Lemma 2. Let 0 < a < 1 and k > There exist E' C E{P^) and A' C 

E{G), \A'\ < 1, such that 

1. P = G[E' U A'] is a path. 

2. |P| < \a{k + l)']. 

3. w{P) > aw(P^). 

Given (G,w), |G(G)| = n, and 0 < a < 1, we define the collection 
of subsets oi V = {P*, ■ ■ ■ ,Pn-i} as follows. For i < j, let P[i,j] denote the 
“interval” of paths {P* , . . . , P*}. We define V' = P[l,3\ 1 ~ 1] • In addition, 

we define Pi, , P/j, where Pi = P[Li, Ui], such that 



1 . Ui=n-l. 

2. Ui = Li_i — 1, for every i >2. 

3. Li = \a{\a{Ui + 1)] + 1)], for every i> 1. 

R is chosen such that Lr < therefore P = P' U (u|LiPi). 

We Qf-cover P' using H' = Upg-p'P, which is a subgraph whose number of 
edges is bounded by I that when k > ^ 

we have \a{k + 1)] < fc and k > Therefore, by Lemma 1 we can use 

P^a(Ui+i)] a-cover for P[|"a(Pi + 1)], Ui]. In addition, when k > 

we have [o;(|’a(fc+ 1)] + 1)] < \a{k + 1)] and \a{k+ 1)] > Therefore, 
by Lemma 2 it is sufficient to add at most one edge to P^a{Ui+i)] 
becomes an a-cover for P[|’a(|"a(Pi + l)] +1)], |"a(17i + l)]] = P[Li, |"a(f7i + l)]]. 
It follows that Pi = P[Li, Ui] can be a-covered by a subgraph Hi C G such that 
\Hi\ < |"a(f7i + 1)] + 1. 

Let H = H' U (ufliiLi), then H is an a-path-robust subgraph. 

Lemma 3. Ui < + 2 olG for every z > 1. 



Lemma 4. P < 



loga-2 3(i^« • 



Lemma 5. |iJi| < + 3 overy z > 1. 



Lemma 6. 



\H\< 



Oc 

1 — Q 



2^ + 



3 

1-a 




oc 

3(l-a) 



n 



1 /o r l+a^ 

2 a(l-a) 



2 



Theorem 1. limsup. 



IE < 



l-a2 • 
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3 cc-Tree-Robust Subgraphs 

We follow the general scheme outlined in Section 1 and define a collection of 
subsets {7i}ig/ of T = {Tj*, . . . , such that Ujg/7i = T. We then inde- 

pendently find for every i G I a, subgraph Hi Q G which a-covers %. However, 
finding a small enough subgraph Hi is significantly harder in this case. 

To demonstrate this difficulty, consider for example a = ^. Suppose we want 
to prove that the weight of a maximum weight tree with approximately k edges is 
at least ^w{T 2 j.). In the paths case, by Lemma 1 we have w(P^_i_i) > 

Actually, since a 2/c-edge path can be represented as a union of two disjoint 
fc-edge paths, a slightly stronger result holds, namely that w{P^) > ^w{P 2 j.). 
However, a 2fc-edge tree cannot generally be decomposed into equal size trees. In 
addition, simple examples show that there are graphs in which w{Tj^ ) < | . 

Therefore, we do not expect that simple arguments can be used to prove lem- 
mas analogous to Lemmas 1 and 2. In the following we study tree decomposition 
schemes, which allow us to achieve matching results. 

3.1 Tree Decomposition Schemes 

Our main result in this section is the following: 

Theorem 2. For every r G N and e > 0 there exists a tree decomposition scheme 
{r,p, c) which satisfies p < ^ + e and c = O (rlog . Moreover, such a decom- 
position can he found in 0{rn) time. 

Before we turn to describe our tree decomposition algorithm, which provides 
a constructive proof for Theorem 2, we present a well known result regarding 
centroid decomposition in trees. 

Definition 2. A centroid u of a tree T is a vertex which minimizes over all 
vertices the size of the largest connected component of T — v. 



Definition 3. Let T be a tree. A partition {T',T”) of T is called a centroid 
decomposition of T if T' and T" are edge-disjoint subtrees of T, such that ^|T| < 
|T'|, \T”\ < ||T| and V{T') fl V(T") = {u}, where u is a centroid of T. 

Lemma 7 ([FJ80]). Let T he a tree with n > 2 vertices. A centroid decompo- 
sition ofT exists, and can he found in 0{n) time. 

Our decomposition algorithm uses procedure Accumulate-Edges, shown 
in Algorithm 1. The input for this procedure consists of: a lower bound L; an 
upper bound U] a set of edges A; and a tree T. We assume that Accumulate- 
Edges is called with L -\- 1 < U, AC\ E{T) = 0, |A| < L and |A| -|- |T| > L. 

Accumulate-Edges recursively transfers edges from the tree T to the edge 
set A, until L < |A| < U. If we initially have |A|-|-|T| < U, then AUT is returned. 
If this condition is not satisfied, we find a centroid decomposition {T',T”) of T, 
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Accumulate-Edges(L, U, A, T) 
if \A\ + \T\<U then 
A-lr- AUT 
return [A, 0) 

{T',T”) <— centroid decomposition of T, where \T'\ < \T"\ 
if |A| + \T'\ < U then 
A ^ AuT' 
if 1^1 ^ L then 

return {A, {T"}) 
else 

return Accumulate-Edges(L, U, A, T”) 

{A', ST) -S— ACCUMULATE-EDGES(L,f/, A,T') 
return (A' , ST U {T"}) 

Algorithm 1: The Aggumulate-Edges procedure 



where \T'\ < \T”\. If the smaller tree, T', can be added to A without exceeding 
the upper bound U, we replace A by A U T' and continue to transfer edges from 
T” . Otherwise, it is sufficient to transfer edges from T' to A. This procedure also 
returns the collection of subtrees of T which were not transferred to A. 

Note that the original bounds L and U are kept unchanged in the follow- 
ing recursive calls. Let A^ and be the additional input for the z-th call to 
Accumulate-Edges. Since a centroid decomposition is performed in each call, 
\Ti\ < max |l, (D* ^ |7i| I for every z > 1. In addition, when a recursive call is 
made, the procedure guarantees that there is always a sufficient number of edges 
in Ti for the procedure to terminate, that is, \Ai\ + \Ti\ > L for every z > 1. 



Lemma 8. Procedure Accumulate-Edges performs at most O ^log 
cursive calls, and returns a set of edges A' such that Ai C A' and L < \A'\ < U . 



Lemma 9. Accumulate-Edges runs in 0(|Ti|) time. 

Given an n-edge tree T, r G N and e > 0, we apply Algorithm t-Decomp, 
shown in Algorithm 2, to find a {r,p, c)-decomposition of T, such that p < ^ + e 
and c = O (r log i). 

The algorithm initially sets the lower bound L = and the upper bound 
U = ^n + max{l, erz}. The edge-sets Ei, . . . , Er, which will define the decompo- 
sition when the algorithm terminates, are initially empty. These sets are built 
in order of increasing index, where in iteration z of the for loop edges are trans- 
ferred from E{T) \ to Ei. The algorithm keeps in SUBT a collection 

of disjoint subtrees of T which contain the edges that were not used up to this 
point. Initially SUBT contains T. 

The set Ei is built in two stage. First, a set A* is constructed by collecting 
subtrees from SUBT. Subtrees are transferred from SUBT to A* until none 
remain or until the next subtree, T*, satisfies |A*| -I- |T*| > L. If T* = 0, we set 
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r-DECOMP(T, r, e) 
n^\T\ 

L t— in, C/ in + max{l, en} 

SC/RT ^ {T} 
for i 1 to r do 

Ei ^ 0, A* ^ 0, T* ^ 0 
while SUBT / 0 do 

T* t— a subtree in SUBT 
SUBT ^ SUBT\{T^} 
if l^'l + \r \ > L then 
break while 
A^U T* 

T* ^ 0 

if rV 0 then 

{Ei,ST) ■«- Accumulate-Edges(L, 17, d\T*) 
S’t/BT ^ St/BT U ST 
else 

Ei^A^ 

return {-Ei, . . . , Er} 



Algorithm 2: The r-DECOMP algorithm 



Ei = . Otherwise, a more refined stage begins, in which edges are transferred 

from r® to A® using the Accumulate-Edges procedure. 

Since we have at this time L + 1 < [/, A® fl E(T®) = 0, |A®| < L and 
I A® I + IT®! > L, the initial arguments for Accumulate-Edges satisfy our as- 
sumptions. By Lemma 8, we obtain a set C A® U T® such that A® C Ei and 
L < \Ei\ < U . Subtrees of T® which were not added to A® are returned to SUBT. 
Note that it is generally possible that there exists 1 < j < r for which Ei = % 
for every j < i < r. 

Therefore, at the end of this process {Ei, . . . ,Er} is indeed a partition of 
E(T), and for every 1 < i < r 

\Ei\ < U = -n + max{l, en} < ( — he)n-|-l . 
r / 

Since \Ei\ is an integer, \Ei\ < -|- e) n\ -1-1. In addition, the combined number 

of connected components in T[Ei\, . . . ,T[E^] is equal to the overall number of 
centroid decompositions performed plus one. By Lemma 8, if Accumulate- 
Edges is called in iteration i of the for loop, it terminates within O ^log 

recursive calls. Since |T®| < n and U — L = max{l,en}, we have — 7- 
This implies that the overall number of centroid decompositions is O (rlog -j). 
In addition, by Lemma 9 algorithm t-Decomp runs in 0(rn) time. Theorem 2 
follows. 
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3.2 A Bound on the Size of a Minimum a- Tree-Robust Subgraph 

The following two lemmas will be subsequently used to obtain our upper bound 
on the number of edges in a minimum a-tree-robust subgraph. 

Lemma 10. Let {r,p,c) be a tree decomposition scheme. Let 1 < z < r — 1. 
Then 

Lemma 11. Let {r,p,c) he a tree decomposition scheme. Let 1 < z < r — 1. 
There exist E' C E(T^) and A! C E{G), \A'\ < c, such that: 

1. The set of edges A! completes T^[E'] to a tree T. 

2. \T\ < z{[pk\ + 1) + c. 

3. w{T) > ^w{T*). 

We now prove an upper bound on the number of edges in a minimum a-tree- 
robust subgraph when 0 < a < 1 is rational. In this case we can write a = ^, 
where G N and 1 < z < r — 1. Assume that (r,p,c) is a tree decomposition 
scheme which satisfies zp < 1. 

We first define the collection of subsets of T = {T*, . . . ,T*_i\ as 

follows. For i < j, let T[i,j] denote the “interval” of trees {Tf , . . . ,T*}. We 
define T' = T[l, — 1]. In addition, we define 7i, . . . ,Tr, where Ti = 

T[Li, Ui], such that 

1. Ui=n-1. 

2. Ui = Li-i — I, for every i >2. 

3. Li = z{[p{z{[pUi\ + 1) + c)J -I- 1) -I- c, for every i > I. 

R is chosen such that Lr < therefore T = T' U (u^^Ti). 

We ^-cover E' using H' = \Jt^t'T, which is a subgraph whose number of 

edges is bounded by \ . Note that when k > [ p(\t.Tp) 1 ^ 

we have z{\pk\ + 1) + c < k. Therefore, by Lemma 10 we can use 
as a ^-cover for T[z{\pUi\ -I- I) -I- c, Ui] . In addition, when k> \ 1 we have 

z{[p{z{[pk\ -I- I) -I- c)J -I- I) -I- c < z{[pk\ -I- I) -I- c. Therefore, by Lemma II it is 
sufficient to add at most c edges to such that it becomes a ^-cover 

for T[z{lp{z{lpU^\ + 1) +c)J -I- l)+c,z{lpU^\ -I- 1) -l-c] = T[Li,z{[pUi\ -I- I)-|-c]. 
It follows that Ti = T[Li, Ui] can be ^-covered by a subgraph Hi C G such that 
|^I^|<^(bC^^J + l) + 2c. 

If we define H = H' U (U^iHi), then iJ is a ^tree-robust subgraph. 
Lemma 12. Ui < {zp)‘^^~‘^n -I- (z -I- c) '^‘jlToizpV , for every i> 1. 



Lemma 13. R < 



log(^p)-2 Ipn ■ 
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Lemma 14. \Hi\ < {zpY^ ^n+ {z + 2c) > for every i> 1. 

Lemma 15. \H\ < [log(^p )-2 \pn 






l+cp 

p(l-zp) 



n N 2 



Lemma 16. limsup„^^ ^ . 



Theorem 3 generalizes the bound in Lemma 16 to arbitrary values of a. 



Theorem 3. limsup„_^go ^ ^ \-a-^ ? for every 0 < a < 1. 



4 Polynomial Time Algorithms 

4.1 An Algorithm for ^^-Path-Robust Path 

We now prove that every complete weighted graph contains a ^^-path-robust 
path, which can be found in polynomial time. We also show that there exists a 
complete weighted graph in which no path is a-path-robust for a > 0.843. Our 
algorithm for finding a ^^-path-robust path is based on robust matchings. 

Definition 4. A perfect matching M, that for every 1 < p < \M\ contains p 
edges whose total weight is at least a times the maximum weight of a p-matching, 
is called an a-robust matching. 

Theorem 4 ([HR02]). Let M he a maximum perfect matching with respect to 
the squared weights w'^ . Then M is a -^-robust matching. 

Given a complete weighted graph {G,w), |T(G)| = n, algorithm Connect- 
Matching, shown in Algorithm 3, first finds M*, a maximum perfect matching 
in G with respect to w"^. It then creates a sequence of paths C • • • C pLfJ^ 
by connecting the edges of M* in order of decreasing weight using intermediate 
edges. 

Note that since in each iteration the path P* is built by connecting P*~i to 
e*, we indeed have P*~i C P* for every 2 < i < . Therefore, P^, . . . , pLtJ 

are subpaths of pLtJ . In addition, |P*| = 2i — 1 for every 1 < f J . 

Lemma 17. pLfJ is a ^^^-path-robust path. 

Lemma 18. The bound is tight. 

Theorem 5. Every complete weighted graph contains a -^^-path-robust path. 

Theorem 6. There exists a complete weighted graph in which no path can guar- 
antee a-robustness for a > 0.843. 




60 



R. Hassin and D. Segev 



Connect-Matching(G, w) 

M* <— a maximum perfect matching w.r.t. 

Assume that M* = {ei , . . . , j } and ui(ei) > • • • > w(e’j^n j ) 

pi ^ G[{eJ}] 

for i 2 to [^J do 

e •(— an edge which connects e* to an endpoint of P®“i 

pi ^ pi-1 g g* 

return pL^J 

Algorithm 3: The Connect-Matching algorithm 



4.2 An Algorithm for |-Tree-Robust Tree 

In the following we constructively prove that every complete weighted graph 
contains a |-tree-robust tree. We also show that there exists a complete weighted 
graph in which no tree is a-tree-robust for a > 0.866. 

As we observed in Section 1, a maximum spanning tree, T*, is generally not 
a-tree-robust for any a > 0. However, we present a polynomial time algorithm 
which converts T* into a |-tree-robust spanning tree. Given (G,w), |H(G)| = n, 
the input for algorithm Robust-Tree, shown in Algorithm 4, is a maximum 
spanning tree of G, T*. We assume that E{T*) = {e*, . . . , e*_i} and w(ej) > 
••• > w(e*_i). 



Robust-Tree(P*) 

H ^T* 

color(et) ^ BLACK 

color(e*) WHITE, i = 2, . . . , n — 1 

while there exists a white edge in H do 

e* minimum index white edge in H [main edge] 
color(e*) BLACK 

if e* does not share a common vertex with other black edges then 
y an endpoint of e* 

a: an endpoint of a black edge, other than e* 

color((®, 2 /)) ^ BLACK 
if {x,y) ^ E{H) then 
H^H + {x,y) 

e a white edge in the cycle created 
H H-e 

return H 



Algorithm 4: The Robust-Tree algorithm 



The algorithm initially sets H = T*, and colors the edges of H in white, 
except for e\, which is colored in black. Throughout the algorithm two invariants 
are kept: H is a spanning tree of G, and the graph induced by the set of black 
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edges is a tree. Let = G[{ei}] be the graph induced by the black edges just 
before the algorithm enters the while loop. Let T-1 be the graph induced by the 
black edges at the end of iteration j — 1 of the while loop, j = 2, . . . , L. 

In each iteration a minimum index white edge, e*, is chosen. This edge, also 
called the main edge for the current iteration, is colored in black. If e* does not 
share a common vertex with some black edge, an edge (x,y) which connects 
an endpoint y of e* to an endpoint x of another black edge is chosen, (cc, y) is 
colored in black and added to H in case that it is not already an edge of H. 
Therefore, the set of black edges still induces a tree. However, if (x, y) was added 
to H, then it closes a cycle with H . Since originally e* did not share a common 
vertex with some black edge, this cycle must contain at least one white edge, 
which is removed to guarantee that H remains a spanning tree. The algorithm 
terminates when the color of every edge in H is black, and returns H = . 

Note that in each iteration at most two edges are added to the set of black 
edges. Therefore, is a tree with at most 2j — 1 edges, for every 1 < j < L. In 
addition, since new black edges are always connected to previous ones, C 
for every 1 < j < L — 1. We also have L > \ , since at most two white edges 

are colored in black or removed in each iteration. 

Lemma 19. H = is ^-tree-robust. 

Lemma 20. The bound ^ is tight. 



Theorem 7. Every complete weighted graph contains a ^-tree-robust tree. 

Theorem 8. There exists a complete weighted graph in which no tree can guar- 
antee a-robustness for a > 0.866. 

4.3 An Algorithm for |-Tree-Robust Subgraph 

We now present a polynomial time algorithm that, given a complete weighted 
graph (G,w), |H(G)| = n, finds a ^-tree-robust subgraph H, such that \H\ 
is at most twice the number of edges in a minimum cardinality ^-tree-robust 
subgraph, and w{H) is at most twice the weight of a minimum weight ^-tree- 
robust subgraph. 

The input for algorithm ^-Robust-Subgraph, shown in Algorithm 5, is a 
maximum spanning tree of G, T* . We assume that E{T*) = {e*, . . . , e*_;^} and 
w{e\) >■■■> w(e;_i). 

The algorithm initially sets H to be the graph induced by the edges e* , . . . , , 

where t is the minimal integer for which X)i=i ^(s*) > ^w{T*). We denote this 
set of edges by A. From this point we assume that t > 1, since t = 0 implies 
w(e) = 0 for every e € E(G), and an empty subgraph is an optimal solution. 

For each edge e* € {e^, . . . , }, in arbitrary order, the algorithm checks if e* 

shares a common vertex with an edge e*, j < L If this condition is not satisfied, 
the algorithm adds the edge (x,y) to H, where y is an endpoint of e*, and x is 
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i-ROBUST-SUBGRAPH(r*) 

t minimal integer such that 'w{e*) > |w(T*) 

a; t— an endpoint of el 

foreach i G {2, ■ ■ ■ ,t} do 

if e* does not share a common vertex with an edge Cj, j < i then 
2 / an endpoint of e* 

H^H + {x,y) 

return H 

Algorithm 5: The |-Robust-Subgraph algorithm 



an endpoint of e\, chosen once at the beginning of the algorithm. We denote by 
S the set of edges added in the foreach loop. Since every edge in S has x as one 
of its endpoints, S is a star in G. 

When the algorithm terminates, it returns the graph H, where E{H) = Aids'. 
Since at most one edge is added in each iteration, |S| < t — 1, and therefore 
|i/| = |A| + |S|<2t-l. 

Lemma 21. H is ^-tree-robust. 

Let OPT'^ and OPT'" be a minimum cardinality ^-tree-robust subgraph and 
a minimum weight ^-tree-robust subgraph, respectively. 

Lemma 22. \H\ < 2\OPT"\ - 1. 

Lemma 23. w{H) < 2w{OPT'"). 

Lemma 24. The hounds on cardinality and weight are tight. 
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Abstract. In this paper we introduce a new notion of collective tree 
spanners. We say that a graph G = {V, E) admits a system of p, col- 
leetive additive tree r -spanners if there is a system T(G) of at most p 
spanning trees of G such that for any two vertices x,y of G a. spanning 
tree T G T{G) exists such that dT{x,y) < da{x,y) + r. Among other 
results, we show that any chordal graph, chordal bipartite graph or co- 
comparability graph admits a system of at most logj n collective addi- 
tive tree 2-spanners and any c-chordal graph admits a system of at most 
logj n collective additive tree (2 [c/2j {-spanners. Towards establishing 
these results, we present a general property for graphs, called {a, r) - 
decomposition, and show that any (a, r)-decomposable graph G with n 
vertices admits a system of at most logj^/Q, n collective additive tree 2r- 
spanners. We discuss also an application of the collective tree spanners 
to the problem of designing compact and efficient routing schemes in 
graphs. 



1 Introduction 

Many combinatorial and algorithmic problems are concerned with the distance 
da on the vertices of a possibly weighted graph G = (V,E). Approximating 
da by a simpler distance (in particular, by tree-distance dp) is useful in many 
areas such as communication networks, data analysis, motion planning, image 
processing, network design, and phylogenetic analysis. An arbitrary metric space 
(in particular a finite metric defined by a general graph) might not have enough 
structure to exploit algorithmically. So, general goal is, for a given graph G, 
to find a simpler graph H = (U, E') with the same vertex-set, such that the 
distance dH{u,v) in El between two vertices u,v \s reasonably close to the 
corresponding distance da{u,v) in the original graph G. 

There are several ways to measure the quality of this approximation, two of 
them leading to the notion of a spanner. For t > 1, a spanning subgraph H of G 
is called a multiplicative t-spanner of G [20,19] if dniujv) < t ■ da{u,v) for all 
u,v &V.lir>Q and dniu, v) < da{u, v)-\-r for all u,v GV, then H is called an 
additive r -spanner of G [17]. The parameters t and r are called, respectively, the 
multiplicative and the additive stretch factors. Clearly, every additive r-spanner 
of G is a multiplicative (r -I- l)-spanner of G (but not vice versa). Note that the 
graphs considered in this paper are assumed to be unweighted. 
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Graph spanners have applications in various areas; especially, in distributed 
systems and communication networks. In [20], close relationships were estab- 
lished between the quality of spanners (in terms of stretch factor and the num- 
ber of spanner edges \E'\), and the time and communication complexities of any 
synchronizer for the network based on this spanner. Unfortunately, the problem 
of determining, for a given graph G and two integers t,m> 1, whether G has a 
multiplicative t-spanner with m or fewer edges, is NP-complete (see [19]). 

The sparsest spanners are tree spanners. As it was shown in [18], they can 
be used as models for broadcast operations in communication networks. Tree 
spanners are favored also from the algorithmic point of view - many algorithmic 
problems are easily solvable on trees. Multiplicative tree t-spanners were studied 
in [6]. It was shown that, for a given graph G, the problem to decide whether G 
has a multiplicative tree t-spanner (the multiplicative tree t-spanner problem) is 
fVP-complete for any fixed t > 4 and is linearly solvable for t = 1,2. Recently, 
this iVP-completeness result was improved - the multiplicative tree t-spanner 
problem is fVP-complete for any fixed t > 4 even on some rather restricted 
graph classes: chordal graphs [3] and chordal bipartite graphs [4]. 

Many graph classes (including hypercubes, planar graphs, chordal graphs, 
chordal bipartite graphs) do not admit any good tree spanner. For every fixed 
integer t there are planar chordal graphs and planar chordal bipartite graphs 
that do not admit tree t-spanners (additive as well as multiplicative) [8,21]. 
However, as it was shown in [19], any chordal graph with n vertices admits a 
multiplicative 5-spanner with at most 2n— 2 edges and a multiplicative 3-spanner 
with at most O(nlogn) edges (both spanners are constructable in polynomial 
time). Recently, the results were further improved. In [8], the authors show that 
every chordal graph admits an additive 4-spanner with at most 2n — 2 edges 
and an additive 3-spanner with at most 0(n log n) edges. An additive 4-spanner 
can be constructed in linear time while an additive 3-spanner is constructable in 
0(m log n) time, where m is the number of edges of G. Even more, the method 
designed for chordal graph is extended to all c-chordal graphs. As a result, it 
was shown that any such graph admits an additive (c+ l)-spanner with at most 
2n — 2 edges which is constructable in 0{cn + m) time. Recall that a graph G 
is chordal if its largest induced (chordless) cycles are of length 3 and c-chordal 
if its largest induced cycles are of length c. 

1.1 Our Results 

In this paper we introduce a new notion of collective tree spanners, a notion 
slightly weaker than the one of a tree spanner and slightly stronger than the 
notion of a sparse spanner. We say that a graph G = (V, E) admits a system 
of p, eollective additive tree r -spanners if there is a system T(G) of at most p 
spanning trees of G such that for any two vertices x,y oi G a spanning tree 
T G T(G) exists such that dT{x,y) < da{x,y) -I- r (a multiplicative variant 
of this notion can be defined analogously). Clearly, if G admits a system of p 
collective additive tree r-spanners, then G admits an additive r-spanner with at 
most p X (n — 1) edges (take the union of all those trees), and if ^ = 1 then 
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G admits an additive tree r-spanner. Note also that any graph on n vertices 
admits a system of at most n — 1 collective additive tree 0-spanners (take n — 1 
Breadth-First-Search-trees rooted at different vertices of G). 

The introduction of this new notion was inspired by the work [1] of Bartal and 
subsequent work [7]. For example, motivated by Bartal’s work on probabilistic 
approximation of general metrics with tree metrics, [7] gives a polynomial time 
algorithm that given a finite n point metric G, constructs 0(n log n) trees and a 
probability distribution on them such that the expected multiplicative stretch 
of any edge of G in a tree chosen according to ip is at most O (log n log log n). 
These results led to approximation algorithms for a number of optimization 
problems (see [1,7] for more details). 

In Section 2 we define a large class of graphs, called {a, r) -decomposable, 
and show that any (a, r)-decomposable graph G with n vertices admits a sys- 
tem of at most logj^/Q, n collective additive tree 2r-spanners. Then, in Sections 
3 and 4, we show that chordal graphs, chordal bipartite graphs and cocompa- 
rability graphs are all (1/2, 1 (-decomposable graphs, implying that each graph 
from those families admits a system of at most log 2 n collective additive tree 2- 
spanners. These results are complemented by lower bounds, which say that any 
system of collective additive tree l-spanners must have f2{y/ri) spanning trees 
for some chordal graphs and f2(n) spanning trees for some chordal bipartite 
graphs and some cocomparability graphs. Furthermore, we show that any c- 
chordal graph is (1/2, [c/2j (-decomposable, implying that each c-chordal graph 
admits a system of at most log 2 n collective additive tree (2 [c/2j (-spanners. 

Thus, as a byproduct, we get that chordal graphs, chordal bipartite graphs 
and cocomparability graphs admit additive 2-spanners with at most (n— 1) log 2 n 
edges and c-chordal graphs admit additive (2 [c/2j (-spanners with at most (n — 
l)log 2 U- edges. Our result for chordal graphs improves the known results from 
[19] and [8] on 3-spanners and answers the question posed in [8] whether chordal 
graphs admit additive 2-spanners with 0(n log n) edges. 

In section 5 we discuss an application of the collective tree spanners to the 
problem of designing compact and efficient routing schemes in graphs. For any 
graph on n vertices admitting a system of at most pL collective additive tree 
r-spanners, there is a routing scheme of deviation r with addresses and routing 
tables of size 0(^log^ n/loglogn) bits per vertex (for details see Section 5). This 
leads, for example, to a routing scheme of deviation (2[c/2j) with addresses and 
routing tables of size 0(log^ n/ log log n) bits per vertex on the class of c-chordal 
graphs. The latter improves the recent result on routing on c-chordal graphs 
obtained in [13] (see also [12] for the case of chordal graphs). We conclude the 
paper with Section 6 where we discuss some further developments and future 
directions. 



1.2 Basic Notions and Notations 

All graphs occurring in this paper are connected, finite, undirected, loopless and 
without multiple edges. In a graph G = (V, E) the length of a path from a vertex 
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f to a vertex u is the number of edges in the path. The distance da{u, v) between 
the vertices u and v is the length of a shortest path connecting u and v. 

For a subset S CV, let radciS) and diamc{S) be the radius and the diam- 
eter, respectively, of S in G, i.e., radciS) = miny^v{'n^o,Xu&s{dG{u,v)}} and 
diamoiS) = maXu,ves{dG(u,v)}. A vertex v €V such that dG{u,v) < radG{S) 
for any m G S', is called a central vertex for S. The value radG{V) is called the 
radius of G. Let also N{v) (fV[z;]) denote the open (closed) neighborhood of a 
vertex v in G, i.e., N{v) = {u € V : uv £ E{G)} and = N{v) U {u}. 

2 (cK, r)— Decomposable Graphs and Their Collective Tree 

Spanners 

Different balanced separators in graphs were used by many authors in designing 
efficient graph algorithms. For example, bounded size balanced separators and 
bounded diameter balanced separators were recently employed in [16] for de- 
signing compact distance labeling schemes for different so-called well-separated 
families of graphs. We extend those ideas and apply them to our problem. 

Let a be a positive real number smaller than 1 and r be a non-negative 
integer. We say that an n- vertex graph G = (V, E) is (a, r)- decomposable if the 
following three conditions hold for G: 

Balanced Separator condition - there exists a set S' C 17 of vertices in G whose 
removal leaves no connected component with more than an vertices; 
Bounded Separator-Radius condition - radG(S) < r, i.e., there exists a vertex 
c in G (called a central vertex for S) such that dG{v, c) <r for any v £ S; 
Hereditary Family condition - each connected component of the graph, ob- 
tained from G by removing vertices of S, is also an (a, r)-decomposable 
graph. 

Note that, by definition, any graph of radius at most r is (a, r)-decomposable. 

Using the first and third conditions, one can construct for any (a, r)-decom- 
posable graph G a (rooted) balanced decomposition tree BT{G) as follows. If G is 
of radius at most r, then BT{G) is a one node tree. Otherwise, find a balanced 
separator S in G, which exists according to the Balanced Separator condition. 
Let Gi, G 2 , . . . , Gp be the connected components of the graph G — S obtained 
from G by removing vertices of S. For each graph Gi (i = 1, . . . ,p), which is 
(a, r)-decomposable by the Hereditary Family condition, construct a balanced 
decomposition tree BT{Gi) recursively, and build BT{G) by taking S to be the 
root and connecting the root of each tree BT{Gi) as a child of S. See Figure 
1 for an illustration. Clearly, the nodes of BT{G) represent a partition of the 
vertex set U of G into clusters Si, S 2 , ■ ■ ■ , Sq of radius at most r each. For a 
node X of BT{G), denote by G(|A) the (connected) subgraph of G induced by 
vertices 1J{^ : F is a descendent of X in BT{G)} (here we assume that A is a 
descendent of itself). 

It is easy to see that a balanced decomposition tree BT{G) of a graph G with 
n vertices and m edges has depth at most logi/^ n, which is 0 {log 2 n) if a is a 
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(a) (b) (c) 

Fig. 1. (a) A graph G, (b) its balanced decomposition tree BT(G) and (c) an induced 
subgraph G(4,A) of G. 



constant. Moreover, assuming that a balanced and bounded radius separator 
can be found in polynomial, say p(n), time (for the special graph classes we 
consider later, p{n) will be at most 0{n^)), the tree BT{G) can be constructed 
in 0((p(n) + m) logi/^ n) total time. Indeed, in each level of recursion we need 
to find balanced and bounded radius separators in current disjoint subgraphs 
and to construct the corresponding subgraphs of the next level. Also, since the 
graph sizes are reduced by a factor a, the recursion depth is at most n. 

Consider now two arbitrary vertices x and y of an (a, r)-decomposable graph 
G and let S{x) and S{y) be the nodes of BT{G) containing x and y, respec- 
tively. Let also NGA]S'j-(^c){S{x),S{y)) be the nearest common ancestor of nodes 
S{x) and S{y) in BT{G) and (Xq, Xi, . . . , Xt) be the path of BT{G) connect- 
ing the root Xq of BT{G) with NGAg'j-(^Q'^{S{x),S{y)) = Xt (in other words, 
Xq, Xi, . . . ,Xt are the common ancestors of S{x) and S{y)). The following lem- 
mata are crucial to all our subsequent results. 

Lemma 1. Any path Px,y, connecting vertices x and y in G, contains a vertex 
from Xq U Ai U • • • U Aj . 

Let SPffy be a shortest path of G connecting vertices x and y, and let Xi 
be the node of the path {Xg, Xi, . . . , Xt) with the smallest index such that 
SP^y Pi Ai yf 0 in G. Then, the following lemma holds. 

Lemma 2. We have dG{x,y) = dG'{x,y), where G' := G(|Aj). 

For the graph G' = G(j,Aj), consider its arbitrary Breadth- First- Search-tree 
(BFS-tree) T' rooted at a central vertex c for Aj, i.e., a vertex c such that 
do'ivjc) < r for any v € Aj. Such a vertex exists in G' since G' is an (a,r)~ 
decomposable graph and Aj is its balanced and bounded radius separator. The 
tree T' has the following distance property with respect to those vertices x, y. 

Lemma 3. We have dT'{x,y) < dG{x,y) -\-2r. 

Let now B\, , By. be the nodes on depth i of the tree BT{G). For each 
subgraph G* := G{lBj) of G (i = 0,1, ■■■ ,depth{BT{G)), j = 1,2, ...,p*). 
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denote by Tj a BFS-tree of graph G* rooted at a central vertex c* for Bj . The 
trees Tj {i = 0,1, . . . , depth{BT{G)) , j = l,2,... ,pi) are called local subtrees of 
G, and, given the balanced decomposition tree BT{G), they can be constructed 
in 0((t(n) + m)logi^^n) total time, where t(n) is the time needed to find a 
central vertex c* for Bj (a trivial upper bound for t(n) is 0(n^)). From Lemma 
3 the following general result can be deduced. 

Theorem 1. Let G be an {a, r)- decomposable graph, BT{G) be its balanced 
decomposition tree and CT{G) = {T^ : i = 0,1, ... ,depth{BT{G)), j = 
1,2, .. . ,pi} be its local subtrees. Then, for any two vertices x and y of G, there 
exists a local subtree Tf, in CT{G) such that dj,ii{x,y) < da{x,y) + 2r. 

This theorem implies two important results for the class of (a, r)- 
decomposable graphs. Let G be an (a, r)-decomposable graph with n vertices 
and m edges, BT{G) be its balanced decomposition tree and CT{G) be the 
family of its local subtrees (defined above). Consider a graph H obtained by 
taking the union of all local subtrees of G (by putting all them together), i.e., 
H := [j{Tf : Tf G CT{G)} = {V,VJ{E{Tf) : T) G CT{G)}). Clearly, if is a 
spanning subgraph of G, constructable in 0{{p{n) + t{n) + m) log^/^ n) total 
time, and, for any two vertices x and y of G, dH{x,y) < dG{x,y) + 2r holds. 
Also, since for every level i {i = 0,1, ... ,depth{BT{G))) of balanced decom- 
position tree BT{G), the corresponding local subtrees T{, . . . ,Tf. are pairwise 
vertex-disjoint, their union has at most n — 1 edges. Therefore, H cannot have 
more than (n — 1) log^/^ n edges in total. Thus, we have proven the following 
result. 

Theorem 2. Any {a, r) -decomposable graph G with n vertices admits an addi- 
tive 2i — spanner with at most (n — 1) log^/^ n edges. 

Instead of taking the union of all local subtrees of G, one can fix f (f G 
{0, 1 , . . . , depth{BT{G))}) and consider separately the union of only local sub- 
trees T{, . . . ,Tf,, corresponding to the level i of the decomposition tree BT{G), 
and then extend in linear 0{m) time that forest to a spanning tree T® of G 
(using, for example, a variant of the Kruskal’s Spanning Tree algorithm for the 
unweighted graphs) . We call this tree T® the spanning tree of G corresponding 
to the level i of the balanced decomposition BT{G). In this way we can obtain at 
most logi/^n spanning trees for G, one for each level i of BT{G). Denote the 
collection of those spanning trees by T{G). By Theorem 1, it is rather straight- 
forward to show that for any two vertices x and y of G, there exists a spanning 
tree T® in T(G) such that dj,,' (x, y) < da{x, y) 2r. Thus, we have 

Theorem 3. Any {a, r) -decomposable graph G with n vertices admits a system 
T{G) of at most logj^/Q, n collective additive tree 2r-spanners. 

Note that such a system T(G) for an (a, r) -decomposable graph G with n 
vertices and m edges can be constructed in 0((p(n) -I- t{n) m) log^j^ n) time, 
where p{n) is the time needed to find a balanced and bounded radius separator 
S and t{n) is the time needed to find a central vertex for S. 
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3 Acyclic Hypergraphs, Chordal Graphs, and 
(cK, r)— Decomposable Graphs 

Let H = (V, £) be a hypergraph with the vertex set V and the hyperedge set £, 
i.e., f is a set of non-empty subsets of V. For every vertex v G V, let £{v) = {e G 
£ : V G e}. The 2-section graph 2SEC{H) of a hypergraph H has V as its vertex 
set and two distinct vertices are adjacent in 2SEC{H) if and only if they are 
contained in a common hyperedge of iJ. A hypergraph El is called conformal if 
every clique (a set of pairwise adjacent vertices) of 2SEC{H) is contained in a 
hyperedge e G £, and a hypergraph H is called acyclic if there is a tree T with 
node set £ such that for all vertices v G V, £{v) induces a subtree T„ of T. For 
these and other hypergraph notions see [2]. 

The following theorem represents two well-known characterizations of acyclic 
hypergraphs. Let C{G) be the set of all maximal (by inclusion) cliques of a 
graph G = {V,E). The hypergraph (V,C{G)) is called the clique-hypergraph of 
G. Recall that a graph G is chordal if it does not contain any induced cycles 
of length greater than 3. A vertex u of a graph G is called simplicial if its 
neighborhood N(y) form a clique in G. 

Theorem 4. (see [2,5]) Let H = (V,£) be a hypergraph. Then the following 
conditions are equivalent: 

(i) El is an acyclic hypergraph; 

(ii) H is conformal and 2SEG{H) of H is a chordal graph; 

(Hi) H is the clique hypergraph (V,C{G)) of some chordal graph G = (V,E). 

Let now G = (V, E) be an arbitrary graph and r be a positive integer. We say 
that G admits a radius r acyclic covering if there is a family S{G) = {Si, . . . , Sk} 
of subsets of V such that 

( 1 ) uti = R; 

(2) for any edge xy of G there is a subset Si (t G {1, . . . , fc}) with x,y G Sp, 

(3) H = (V,S{G)) is an acyclic hypergraph; 

(4) radciSi) < r for each i = 1,. . . ,k. 

A class of graphs T is called hereditary if every induced subgraph of a graph 
G belongs to T whenever G is in T. A class of graphs T is called {a,r)~ 
decomposable if every graph G from T is (a, r)-decomposable. 

Theorem 5. Let IF be a hereditary class of graphs such that any G G IF admits 
a radius r acyclic covering. Then IF is a {1/2, r) -decomposable class of graphs. 

Since for a chordal graph G = (V,E) the clique hypergraph (V,C{G)) is 
acyclic and chordal graphs form a hereditary class of graphs, from Theorem 5 
and Theorems 2 and 3, we immediately conclude 

Corollary 1. Any chordal graph G with n vertices and m edges admits an ad- 
ditive 2-spanner with at most (n — 1) log 2 n edges, and such a sparse spanner 
can be constructed in 0 (mlog 2 n) time. 
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Corollary 2. Any chordal graph G with n vertices and m edges admits a system 
T{G) of at most log 2 n collective additive tree 2-spanners, and such a system of 
spanning trees can he constructed in 0 (mlog 2 n) time. 

Note that, since any additive r-spanner is a multiplicative (r + l)-spanner, 
Corollary 1 improves a known result of Peleg and Schaffer on sparse spanners 
of chordal graphs. In [19], they proved that any chordal graph with n vertices 
admits a multiplicative 3-spanner with at most 0(nlog2n) edges and a multi- 
plicative 5-spanner with at most 2n — 2 edges. Both spanners can be constructed 
in polynomial time. Note also that their result on multiplicative 5-spanners was 
earlier improved in [8] , where the authors showed that any chordal graph with n 
vertices admits an additive 4-spanner with at most 2n — 2 edges, constructable 
in linear time. Motivated by this and Corollary 2, it is natural to ask whether 
a system of constant number of collective additive tree 4-spanners exists for a 
chordal graph (or, generally, for which r, a system of constant number of col- 
lective additive tree r-spanners exists for any chordal graph). Recall that the 
problem whether a chordal graph admits a (one) multiplicative tree t-spanner is 
NP-complete for any t > 3 [3]. 

Peleg and Schaffer showed also in [19] that there are n-vertex chordal graphs 
for which any multiplicative 2-spanner will need to have at least I7(n^/^) edges. 
This result leads to the following observation on collective additive tree 1- 
spanners of chordal graphs. 

Observation 6. There are n-vertex chordal graphs for which any system of 
collective additive tree 1-spanners will need to have at least f2{y/n) spanning 
trees. 

4 Collective Tree Spanners in c-Chordal Graphs 

A graph G is c-chordal if it does not contain any induced cycles of length greater 
than c. c-Chordal graphs naturally generalize the class of chordal graphs. Chordal 
graphs are precisely the 3-chordal graphs. 

Theorem 7. The class of c-chordal graphs is (1/2, [c/2\) -decomposable. 

A balanced separator of radius at most [c/2j of a c-chordal graph G on n 
vertices can be found in O(n^) time. Thus, from Theorems 2 and 3, we conclude 

Corollary 3. Any c-chordal graph G with n vertices admits an additive 
{2\c/2\) -spanner with at most (n — l)log 2 n edges, and such a sparse spanner 
can he constructed in 0{n^ log 2 n) time. 



Corollary 4. Any c-chordal graph G with n vertices admits a system T{G) of 
at most log 2 n collective additive tree {2\c/2\) -spanners, and such a system of 
spanning trees can he constructed in 0{n^ log 2 n) time. 
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Note that there are c-chordal graphs which do not admit any radius r acyclic 
covering with r < [c/2j . Consider, for example, the complement of an induced 
cycle Ce = (a — b— c— d— e — f — a), which is a 4-chordal graph. A family 
5(C6) consisting of one set {a, b, c, d, e, /} gives a trivial radius 2 = [4/2J acyclic 
covering of Cg, and a simple consideration shows that no radius 1 acyclic covering 
can exist for Cq (it is impossible, by simply adding new edges to Cg, to get a 
chordal graph in which each maximal clique induces a radius one subgraph of 

c^). 

Next we will show that yet an interesting subclass of 4-chordal graphs, namely 
the class of chordal bipartite graphs, does admit radius 1 acyclic coverings. A 
bipartite graph G = {X U Y, E) is chordal bipartite if it does not contain any 
induced cycles of length greater than 4. 

For a chordal bipartite graph G, consider a hypergraph H = {X U Y, : 

y G Y}). In full version we show that H is an acyclic hypergraph. Since chordal 
bipartite graphs form a hereditary class of graphs and for any chordal bipartite 
graph G = {X UY,E), a family {A^[y] : y GY} of subsets of A U Y satisfies all 
four conditions of radius 1 acyclic covering, by Theorem 5 we have 

Theorem 8. The class of chordal bipartite graphs is (1/2,1) -decomposable. 

Another interesting subclass of 4-chordal graphs is the class of cocompara- 
bility graphs. It is well-known that cocomparability graphs contain all interval 
graphs, all permutation graphs and all trapezoid graphs (see, e.g., [5] for the def- 
initions) . Since Gq is a cocomparability graph, cocomparability graphs generally 
do not admit radius 1 acyclic coverings (although, we can show that both the 
class of permutation graphs and the class of trapezoid graphs do admit radius 
1 acyclic coverings [9]). In full version we present a very simple direct proof for 
the statement that the class of cocomparability graphs is (1/2, l)-decomposable. 

Theorem 9. The class of cocomparability graphs is (1/2, l)-decomposable. 



Corollary 5. Any chordal bipartite graph or eocomparability graph G with n 
vertices and m edges admits an additive 2-spanner with at most (n — 1) log 2 n 
edges, and such a sparse spanner can he constructed in 0 (nm\og 2 n) time for 
chordal bipartite graphs and in 0 (mlog 2 n) time for eocomparability graphs. 



Corollary 6. Any chordal bipartite graph or cocomparability graph G with n 
vertices and m edges admits a system T(G) of at most log 2 n collective addi- 
tive tree 2-spanners, and such a system of spanning trees can he constructed 
in 0 (nmlog 2 n) time for chordal bipartite graphs and in 0 (mlog 2 n) time for 
cocomparability graphs. 

Recall that the problem whether a chordal bipartite graph admits a (one) 
multiplicative tree f-spanner is NP-complete for any t > 3 [4]. Also, any chordal 
bipartite graph G with n vertices admits an additive 4-spanner with at most 
2n — 2 edges which is constructable in linear time [8]. Again, it is interesting to 




Collective Tree Spanners of Graphs 



73 



know whether a system of constant number of collective additive tree 4-spanners 
exists for a chordal bipartite graph. 

It is known [21] that any cocomparability graph admits an (one) additive 
tree 3-spanner. In a forthcoming paper [11], using different technique, we show 
that the result stated in Corollary 6 can further be improved. One can show 
that any cocomparability graph admits a system of two collective additive tree 
2-spanners and there are cocomparability graphs which do not have any (one) 
additive tree 2-spanner. 

We have the following observation on collective additive tree 1-spanners for 
chordal bipartite graphs and cocomparability graphs. 

Observation 10. There are chordal bipartite graphs and cocomparability graphs 
on n vertices for which any system of collective additive tree 1-spanners will need 
to have at least I7(n) spanning trees. 

5 Collective Tree Spanners and Routing Labeling 
Schemes 

An important problem in large scale communication networks is the design of 
routing schemes that produce efficient routes and have relatively low memory 
requirements. Following [18], one can give the following formal definition. A 
family 5ft of graphs is said to have an l{n) routing labeling scheme if there is a 
function L labeling the vertices of each n- vertex graph in 5ft with distinct labels of 
up to l{n) bits, and there exists an efficient algorithm, called the routing decision, 
that given the label of a source vertex v and the label of the destination vertex 
(the header of the packet), decides in time polynomial in the length of the given 
labels and using only those two labels, whether this packet has already reached 
its destination, and if not, to which neighbor of v to forward the packet. The 
efficiency of a routing scheme is measured in terms of its multiplicative stretch, 
called delay, (or additive stretch, called deviation), namely, the maximum ratio 
(or surplus) between the length of a route, produced by the scheme for some 
pair of vertices, and their distance. Thus, the goal is, for a family of graphs, to 
find a routing labeling scheme with small stretch factor, relatively short labels 
and fast routing decision. 

To obtain routing schemes for general graphs that use o(n)-bit label for each 
vertex, one has to abandon the requirement that packets are always routed on 
shortest paths, and settle instead for the requirement that packets are routed on 
paths with relatively small stretch. Recently, authors of [22] presented a routing 
scheme that uses 0(n^/^) bits of memory at each vertex of an n-vertex graph 
and has delay 3. Note that, each routing decision takes constant time in their 
scheme, and the space is optimal, up to a logarithmic factor, in the sense that 
every routing scheme with delay < 3 must use, on some graphs, routing labels 
of total size I7(n^), and hence I7(n) at some vertex (see [15]). 

In [14,22], a shortest path routing labeling scheme for trees of arbitrary de- 
gree and diameter is described that assigns each vertex of an n-vertex tree a 
0(log^ n/loglogn)-bit label. Given the label of a source vertex and the label 
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of a destination it is possible to compute, in constant time, the neighbor of the 
source that heads in the direction of the destination. This result for trees was 
recently used in [12,13] to design interesting low deviation routing schemes for 
chordal graphs and general c-chordal graphs. [12] describes a routing labeling 
scheme of deviation 2 with labels of size 0(log^ n/ log log n) bits per vertex and 
0(1) routing decision for chordal graphs. [13] describes a routing labeling scheme 
of deviation 2[c/2j with labels of size O(log^n) bits per vertex and O(loglogn) 
routing decision for the class of c-chordal graphs. 

Our collective additive tree spanners give much simpler and easier to under- 
stand means of constructing compact and efficient routing labeling schemes for 
all (a, r)-decomposable graphs. We simply reduce the original problem to the 
problem on trees. The following result is true. 

Theorem 11. Each {a, r) -decomposable graph with n vertices and m edges ad- 
mits a routing labeling scheme of deviation 2r with addresses and routing ta- 
bles of size 0(log^ n/ log log n) bits per vertex. Moreover, once computed by the 
sender in log 2 n time, headers never change, and the routing decision is made in 
constant time per vertex. 

Projecting this theorem to the particular graph classes considered in this 
paper, we obtain the following result: 

— Any c-chordal graph (resp., chordal, chordal bipartite or cocomparability 
graph) admits a routing labeling scheme of deviation 2[c/2j (resp., of de- 
viation 2) with addresses and routing tables of size 0(log^n/loglogn) bits 
per vertex. Moreover, once computed by the sender in log 2 n time, headers 
never change, and the routing decision is made in constant time per vertex. 



6 Further Developments 

In forthcoming papers [9,10,11], we extend the method described in Section 2 
and apply it to other families of graphs such as homogeneously orderable graphs, 
AT-free graphs, graphs of bounded tree-width (including series-parallel graphs, 
outerplanar graphs), graphs of bounded asteroidal number, and others. We show 

— any homogeneously orderable graph admits a system of at most log 2 n col- 
lective additive tree 2-spanners, 

— any AT-free graph admits a system of two collective additive tree 2-spanners, 

— any graph with bounded by a constant asteroidal number admits a system 
of a constant number of collective additive tree 3-spanners, 

— any graph of bounded by a constant tree-width admits a system of at most 
0 (log 2 n) collective additive tree 0-spanners. 

Note that, although the class of homogeneously orderable graphs is not heredi- 
tary, our ideas still applicable. 

We conclude this paper with two open questions: 
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1. What is the complexity of the problem ’’Given a graph G and integers /r, 

r, decide whether G has a system of at most fi collective additive tree r- 
spanners” for different /i > 1, r > 0 on general graphs and on different 
restricted families of graphs? 

2. What is the best trade-off between the number of trees /i and the additive 

stretch factor r on planar graphs? (So far, we can state only that any planar 
graph admits a system of 0(i/n log 2 n) collective additive tree 0-spanners.) 
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Abstract. Batching has been studied extensively in the offline case, 
but many applications such as manufacturing or TCP acknowledgement 
require online solutions. 

We consider online batching problems, where the order of jobs to be 
batched is fixed and where we seek to minimize average flow time. We 
present optimally competitive algorithms for s-batch (competitive ratio 
2) and p-batch problems (competitive ratio of 4). We also derive results 
for naturally occurring special cases. In particular, we consider the case 
of unit processing times. 

Keywords: Design of Algorithms; Online Algorithms; Batching, TCP 
acknowledgement . 



1 Introduction 

Batching problems are machine scheduling problems, where a set of jobs J = 
{!,..., n} with processing times pi, i € J, has to be scheduled on a single 
machine. The set of jobs J has to be partitioned into J = Ufe=i to form a 
sequence of batches J7i, . . . , Jr, for some integer r. A batch combines jobs to run 
jointly, and each job’s completion time is defined to be the completion time of the 
entire batch. We assume that when each batch is scheduled it requires a setup 
time s. In an s-batch problem the length of a batch is the sum of the processing 
times of the jobs in the batch, whereas in a p-batch problem the length is the 
maximum of the processing times in the batch. We seek to find a schedule that 
minimizes the total flow time ^ ti, where ti denotes the completion time of job 
z in a given schedule, and consider the versions the problems, where the order 
of the jobs is given and fixed. We respectively refer to these problems as the list 
s-batch problem and list p-batch problem. We remark that the use of the term 
“s-batch” intuitively has to do with the fact that in an s-batch problem jobs are 
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to be processed sequentially, whereas for “p-batch” problems the jobs in each 
individual batch are to processed on the machine in some parallel manner. 

We say that the batching problem is online, if jobs arrive one by one, and each 
job has to be scheduled before a new job is seen. By “a job has to be scheduled” 
we mean that the job has to either (a) be included in the current batch or (b) 
it has to be scheduled as the first job of a new batch that is initialized at that 
point. An algorithm that follows this protocol is called an online algorithm for 
the batching problem. We say the online algorithm hatches a job if it follows 
the action described in (b), otherwise, if it follows the action in (a) we say the 
algorithm does not batch. 

The quality of an online algorithm A is measured by the competitive ratio, 
which is the worst case ratio of the cost of A to the cost of an optimal offline 
algorithm opt that knows the entire sequence of jobs in advance. We note that 
the offline list s-batch problem is a special case of the l|s — batch] ^ Ci problem 
which has been well studied and has a linear time algorithm [3]. Many related 
offline problems have been studied as well, see e.g. [4,2]. 

An application of the problem is the following. Jobs (or processes) are to 
be run on either a single processor or on a large number of multiple processors. 
Jobs are partitioned into batches that use a joint resource. The resources of each 
batch have to be set-up before it can start. The successful processing of a batch 
is acknowledged when it terminates. A job may be seen as completed when an 
acknowledgement is sent (and not necessarily when its execution stops), which 
is done after all jobs of the batch are completed. The goal is to minimize the 
sum of flow times of all jobs. An s-batch simulates a single processor, in that 
case at each time one job is run, and a batch is completed when all its jobs 
are completed, i.e. the time to run a batch is the sum of processing times of 
its jobs. A p-batch simulates a multiprocessor system where each job may run 
on a different processor and therefore the time to run a batch is the maximum 
processing time of any job in the batch. 

Our problem is related to the TCP acknowledgement problem. With TCP 
there exists a possibility of using a single acknowledgement packet to simultane- 
ously acknowledge multiple arriving packets, thereby reducing the overhead of 
the acknowledgments. Dooly, Goldman, and Scott [5] introduced the dynamic 
TCP acknowledgement problem in which the goal is to minimize the number 
of acknowledgments sent plus the sum of the delays of all data packets which 
are the time gaps between the packets arrival time and the time at which the 
acknowledgement is sent. The above paper gave an optimally competitive algo- 
rithm (with competitive ratio 2) for this problem. Albers and Bals [1] derived 
tight bounds for a variation of the problem in which the goal is to minimize 
the number of acknowledgments sent plus the maximal delay incurred for any of 
the packets. A model where times of arrival of packets depend on the actions of 
the algorithm was studied in [9]. A more generalized problem where a constant 
number of clients is to be served by a single server was recently studied in [6] . 

As mentioned above, the s-batch problem is an online one-machine scheduling 
problem. The p-batch can be seen as such a problem as well, where the single 
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machine is capable of processing several jobs in parallel. We can see the batching 
problems both as scheduling to minimize the sum of completion times, and as 
scheduling to minimize the sum of flow times, as no release times are present 
(the flow time of a job is its completion time minus its release time). However, 
both the classical “sum of completion times” problem and the “sum of flow 
times” are very different from the s-batch problem. For completion times, the 
optimal competitive ratio is 2 [10,11,12], whereas for the flow problem the best 
competitive ratio can be easily shown to be linear in the number of jobs. In these 
two problems there are release times and no set-up times, so there are no batches. 
Each job is run separately and the jobs do not need to be assigned in the order 
they arrived. There are very few one-machine papers where immediate decision 
on assignments is required. Fiat and Woeginger [8] studied one such model where 
the goal is minimization of total completion time. A single machine is available 
to be used starting time zero. Each job has to be assigned (immediately upon 
arrival) to a slot of time. The length of this slot should be identical to the 
processing requirement of the job. However, idle times may be introduced, and 
the jobs can be run in any order. It was shown that the competitive ratio is 
strictly larger than logarithmic in the number of jobs n, but for any e > 0, an 
algorithm of (logn)^^® competitive ratio exists. Another immediate dispatching 
problem to minimize the sum of completion times (plus a penalty function) 
is studied in [7]. Jobs arrive one by one, where a job can be either accepted 
or rejected by paying some penalty (which depends on the job). The penalty 
function is simply the sum of penalties of rejected jobs. 

The paper is organized as follows: Section 2 gives optimally competitive re- 
sults for list s-batching. In Section 3, we deal with the important special case 
where all the processing requirements and also the setup time are equal. Sections 
4 gives optimal list p-batching results. 



2 Optimally Competitive List s-Batching 

Throughout this paper we assume for the setup time that s = 1, since processing 
times can be scaled appropriately. Also, we make use of very short null jobs. We 
denote a null job by the symbol O, and a sequence of null jobs by 0 i, 02 , — 
The length of a null job is e > 0, where e is arbitrarily small, and therefore we 
simplify our exposition by appropriately ignoring this quantity for the length of 
a schedule. We note that in all of our proofs the £ quantities could be introduced 
explicitly and then a limit taken at the end, with no change to the results. 

Theorem 2.1. The competitive ratio of any deterministic online algorithm for 
the list s-batch problem is no better than 2. 

Proof. We first show that for any deterministic online algorithm A one can 
construct a request sequence such that the cost for A cannot be better than 
twice the cost of opt on that sequence. Such a sequence is made up of a number 
of phases. Each phase consists of a number L of null jobs, followed by a single 
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job 7T of length 1; we write = Oi, . . . An entire sequence is of the 

form: 

„ — N'‘ 

Pk — CT , (T , . . . , (T , 

where N is a large integer. Thus a sequence is always made up of a number 
of phases, where the length of the phase is increasing from phase to phase. A 
sequence of this form is sufficient to prove our result, for any A. Note that the 
number of null jobs in a phase grows in such a way that the cost for all null 
jobs of previous phases is of a lower order than the cost for the null jobs of the 
current phase. An optimal offline solution knows how many phases there are, and 
assigns all jobs into two batches. The first batch clearly starts in the beginning. 
The other time the algorithm batches is just before the very last job. The goal 
of batching a second time is to pay less for the null jobs of the last phase. The 
coefficient of the highest power of N in the cost is therefore simply the number 
of phases (the number of unit jobs in all phases but the last, plus one setup 
time). 

Certainly A must either 

a) batch during every phase A for i < m, for some m <C A^, or 

b) have an earliest phase i < m in which it does not batch. 

Case a: In this case the sequence is chosen to be pm- We have 

costopt = iV™(l + (m - 1)) + 0(iV™-i). 

We note that we may w.l.o.g. assume A always batches towards the end of the 
phase right before the job of length 1. Thus, 

cost_4 = N”^{m + (to — 1)) + 0(iV’"“^). 

The lower bound follows for A since 

TO + (to — 1) 

^ — >■ 2 

1 + (to — 1) 



as TO increases. 

Case b: Whenever A batches we may as before assume that A always batches 
towards the end of the phase right before the job of length 1. Unlike before, in 
this case there exists a smallest ^ such that A does not batch in phase To 
obtain our result the sequence is chosen to be p^. We have cx>stgpi = A^^(l + — 

1)) + whereas costj^ = N^{i + £) + yielding again the lower 

bound of 2. 

We now present a class of algorithms, one of which achieves the competi- 
tive ratio of 2 and thus matches the lower bound. For each B > 0, we define 
algorithm PsEUDOBATCH(i?) and, in fact, we call the algorithm with parameter 
value B = \ simply “Pseudobatch”, i.e., without any parameter. Algorithm 
PsEUDOBATCH(i?) keeps a tally of processing thus far; we call the set of jobs 
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associated with this tally the “pseudo batch” . Once a job a causes the process- 
ing requirements in the pseudo batch to exceed B, algorithm Pseudobatch(_B) 
batches, and the pseudo batch is cleared. Note that this means that (a) a is the 
first job in a new batch and (b) the old tally, which contains Pa, is cleared and 
thus Pa is not part of the upcoming tally during the batch just opened. We note 
that therefore in general the pseudo batches and the actual batches created by 
the algorithm are shifted by one job, and the very first job does not belong to 
any pseudo batch. 

Theorem 2.2. The competitive ratio of algorithm Pseudobatch is no worse 
than 2. 

Proof. We note that for an optimal schedule with completion times t*, ^ 2 , • • we 
have 

t* 1 Si 

where S^ = J2]=i Pj- 

For the algorithm, we have 



U < mi + Si + 1 



where mi is the number of batches up to job i including the current batch. We 
also have 

mi < 1 + Si . 

Thus ti <2 + 2Si, which implies the result. 

The next result shows that the exact competitiveness of 2 relies on the fact 
that the jobs may be arbitrarily small. Indeed, it is easy to show that if there is a 
lower bound on the size of the jobs then it is possible to construct an algorithm 
with competitiveness better than two. 



Theorem 2.3. If the processing time of every job is at least p, then there is a 
C -competitive online algorithm, where C = • 



Proof. We consider Pseudobatch(B), with B = ^Jp 1. Note that C = 1 -I . 
We prove that Pseudobatch(B) is C-competitive, given that Pi > p for all i. 

As before, let Si = the completion time of the job in the 

optimal schedule, ti the completion time of the job in the schedule created 
by PsEUDOBATCH(i?), and mi the number of batches up to and including the 
batch which contains the i**' job in the schedule created by PsEUDOBATCH(i?). 
We shall prove that 



t* 



< C 



( 1 ) 



for all i. 
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The i*'' job is in the mf^ batch. Let us say that the job is the last job 
in that batch. Then Si — Si < B, since i + 1, . . . ,£ must be in the same pseudo 
batch. 

As before, t* > Si + 1, and ti = Se + rrn. By the definition of the algorithm 
PsEUDOBATCH(i?), we have rrii < + 1. Since pi > p, we have that 

Pi + 1 + B ^ P+1 + B _ ^ 

Pi + 1 “ p + 1 

Recall that 1 + ^ = C. We have: 

ti ^ Si~\- rrii 
t* - ^1 + 1 
^ Si~\- rrii + B 
- S, + l 

_ Sj - Pi + rUi + Pi + B 

Si-pi + l+pi 

^ (l + 5)('5'i — Pi) + 1 +Pi + B 

“ Si-pi + l+pi 

^ CiSi-pi) + C{l + pi) ^ 

~ Si-Pi + I+Pi 

This verifies (1) for each i, and we are done. 

3 Identical Job Sizes 

For machine scheduling problems it is typical that restricting to unit jobs makes 
the problem easier to analyze. However, this is not the case for list batching. In 
this section we give results for the case s = 1 and pj = 1, for all j G In this 
case we can give an exact description of the optimal offline solution. To this end 
define 



optcost [n] = optimal cost of n jobs 
firstbatch[n] = size of first batch for n jobs. 



We have the following recursive definition of optcost[n]: 

. 1 _ J 0 forn = 0 

op cos [n\ <y optcost[p\ + n(n — p + 1) for all n > 0. 



To see this let n — p be the number of items in the first batch. Then p is the 
number of items in the remaining batches. We assume that they are processed 
optimally. The cost of processing the first batch is (n — p)(n — p + 1). The cost 
of processing all remaining batches is optcost[p] + (n — p + l)p. 
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We define a function -F[n], for n > 0, as follows. If n = m{m + l)/2 + A: for 
some m > 0 and some 0 < fc < m + 1, then 



m(TO + l)(m + 2)(3m + 5) , k{k+\) 

F[n] = — - + k{n + m-k + l) + ^^ — (2) 



In the special case that n = for some m > 0, then the rule gives two 

different formulae for F[n]. Routinely, we verify that the values are equal, in fact 
they are both equal to m(m + i)3mi+dini+i2^ 

The following facts are useful in describing the offline solution in closed form: 

Lemma 3.1. a) Ifn= +fc where 0 < k < m+1, then F[n+l] — F[n] = 

n + m + 2. 

b) Ifn= !ld!L±ii^ _ X] = ^ _l_ 

c) Ifn> 1, then F[n + 1] + F[n — 1] > 2F[n]. 



Proof. We first prove part a). To that end, let 



, m(m + l)(m + 2)(3m + 5) ,, , k{k + 1) 

F[n] = — ^ + A:(n + m - A: + 1) + — - 



and 



F[n+1] 



m{m + l)(m + 2)(3m + 5) 
24 



(A: + l)(n + TO 






Then 

F[n + 1] — F[n] = {n + m — k+l) + k + l = n + m + 2, 
which proves part a). 

For part b) simply write n = (m — l)m/2 + m, then apply part a). 

In the proof of part c) we consider two cases: 

Case I: n = m{m + I)/2 + A: where 0 < A: < m + I. Applying part a) twice. 



F[n + 1] + F[n — 1] — 2F[n] = (n + m + 2) — (n + m + 1) = 1. 



Case 2: n = m{m + 1) /2 for some m > 0. Then F\n + 1] — F\n] = n + m + 2 by 
part a), and F[n] — F[n — 1] = n + m by part b). Thus 

F[n + 1] + F[n - 1] - 2F[n] = 2. 

We are now ready to give the closed form: 

Theorem 3.2. For optcost[n], optcost = F[n] for all n > 0. Furthermore, if 
n = + k for some m > 0 and some 0 < k < m + 1, then the optimal size 

of the first batch is m if k = 0, is m + 1 if k = m+1, and is either m or to + 1 
z/0<A:<to+1. 
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Proof. We first show optcost[n] < F[n] for all n by strong induction on n. 

If n = 0 we are done. If n > 0, select m > 0 and 0 < k < m + 1 such that 
n= + 

We show 

F[n] = F[n — m] + n(m + 1) . (3) 

To show equation 3, note 



, m(m + l)(m + 2)(3m + 5) ,, , k{k + 1) 

F[n] = — + k{n + 'm-k + l) + 



Then, since 0 < k < m 



, (m - l)m(m + l)(3m + 2) k{k+l) 

F[n -m] = f 24 ^ + 2 ' 



Then 



F[n] — F[n — m] — n{m + 1) = 
m{m + l)(12m + 12) 



24 



+ k{m + 1) - _ k{m + 1) = 0. 



This establishes equation 3. 

By the inductive hypothesis, F[n — m] > optcost[n — m]. By definition, 
optcost[n] < optcost[n — m] + n{m + 1). Then F[n] = F[n — m] + n{m + 1) > 
optcost[n — m] + n{m + 1) > optcost[n]. This completes the proof of the fact 
optcost[n] < F[n] for all n. 

Now follows the proof that indeed optcost[ri\ = F[n] holds. This is also by 
strong induction. The case n = 0 is trivial. For fixed n > 0, we now define 



G[p] = F[p] + n{n — p+1) for all p < n. 



It is easy to see that 

G[p + 1] + G[p — 1] > 2G[p] for all p > 0. (4) 

since the second term in the definition of G is linear and by Lemma 3.1, part c). 
We use the convexity to find the minimum of the function G[p]. In each of the 
following cases we show what the minimum is. 

Let n > 0. Write n = + k, where m > 0 and 0 < fc < m + 1. 

Case 1: fc = 0. Then let p = n — m = gy definition of G and by Lemma 

3.1, part a), we have 

G[p+1]-G[p] = l. 

By definition of G and by Lemma 3.1, part b), we have 

G[p]-G[p-1] = -1. 

We can easily check that F[n] = G[p]. By equation 4 and by the inductive 
hypothesis, it follows that optcost[n] = F[n] and that firstbatch[n] = m. 
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Case 2: fc > 0. Then letp = n — m— 1= _l_ _ 1, gy definition of G and 

by Lemma 3.1, part a), we have 

G[p+1]-G[p] = 0. 

By definition of G and by Lemma 3.1, part a), we have 

Gb + 2]-G[p+l] = l. 

If /c = 1, by definition of G and by Lemma 3.1, part b), we have 

G[p]-G[p-l] = -2. 

If A: > 1, by definition of G and by Lemma 3.1, part a), we have 

Gb]-G[p-1] = -1. 

We can easily check that 



F[n] = G[p\ = G[p+l\. 

It follows the inductive hypothesis and equation 4, that optcost[n] = F[n] and 
that firstbatch[n] = m or m + 1. 

Our next goal is to find a lower bound on the competitive ratio of any algo- 
rithm for the unit jobs case and give an algorithm which achieves this ratio. 

Define T> to be the algorithm which batches after jobs: 2, 5, 9, 13, 18, 23, 29, 
35, 41, 48, 54, 61, 68, 76, 84, 91, 100, 108, 117, 126, 135, 145, 156, 167, 179, 192, 
206, 221, 238, 257, 278, 302, 329, 361, 397, 439, 488, 545, 612, 690, 781, 888, 
1013, 1159, 1329, 1528, 1760, and 2000-h40f for all i > 0. 

Theorem 3.3. For the list hatching problem restricted to unit job sizes no online 
algorithm can have a competitive ratio smaller than 619/583 and the algorithm 
described above achieves this ratio. 

Proof. Any online algorithm for list batching restricted to unit jobs is described 
by a sequence of decisions: should the job be the first job in a new batch? 
In other words, every online algorithm is a path in a decision tree where a node 
at level i has two children: one representing the choice not to batch prior to job 
i and one representing making job i the first job in a new batch. However, it 
can be noted that having an empty batch only increases an algorithm’s cost and 
therefore the first job should begin the first batch (i.e. we should not close the 
first batch prior to the first job) . 

If we can show that any path from the root to a node with depth d in the 
decision tree must encounter a node at which the ratio of online cost to offline 
cost is at least 619/583 then we have established our lower bound. Utilizing a 
small computer program it is easy to verify that this fact holds for d = 100. 
What is unusual is that considering less than 100 jobs does not yield the bound. 
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Pruned Decision Tree 
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Fig. 1. The Decision Tree used in the Pruning Procedure 



Consider the algorithm T> described above. Verifying that T> maintains a cost 
ratio of at most 619/583 for all job sequences with less than 2000 jobs is tedious 
but trivial for a computer program. For sequences with more than 2000 jobs we 
note that: 1) the contribution of the first 2000 jobs to the optimal cost can only 
increase because the size of the optimal batches increases with the number of jobs 
and 2) the contribution of job i > 2000 to the optimal cost is at least i + 1 while 
the contribution to the online cost is at most f + 48 + (t — 2000) /40 < 619/583T 
Therefore V is 619/583-competitive. 

Given that there are exponentially many paths from the root to a node 
at depth d, two notes on efficiency are appropriate here. First, if a node is 
encountered where the ratio of costs is greater than or equal to 619/583 then 
no further descendants need to be checked. This alone brings the calculation 
described above to manageable levels. Second, given two nodes ni and U 2 which 
have not been pruned by the previous procedure, if the online cost at ni is less 
or equal to the online cost at ri 2 and both have done their most recent batching 
at the same point then descendants of ri 2 need not be considered. This follows 
because the cost on any sequence of choices leading from ri 2 is greater or equal 
to the same cost on ni. We illustrate the preceding ideas with the diagram of 
Figure 1. Level i corresponds to all possible decisions after i jobs have arrived. 
We can prune at level 3 because 12/11 > 619/583 and descendants of the starred 
node need not be considered. 

4 The List p-Batch Problem 

We now turn to the list p-batch problem and define a class of algorithms. 
Threshold, one of which has an optimal competitive ratio. For a sequence A =< 
oi, 02, . . . > we define Threshold( 4 ) to be the algorithm that batches for the 
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time whenever the processing requirement is larger of equal threshold value a^. 
Specifically, we consider the sequence A* = < (i + 1)2* — 1, t = 1, 2, . . . >, and 
we write “Threshold”, z.e., without any parameter, to mean Threshold(A*). 
We have: 

Theorem 4.1. The competitive ratio of algorithm Threshold is no worse 
than 4. 

Proof. Consider a job j which is in the batch of the online algorithm. This 
single job contributes at most £ + to the online cost, because there have 

been at most £ set up times and the length of batch i is at most a*. On the other 
hand, it contributes at least 1 + ai_^ to the offline cost. The calculation below 
shows that the ratio is 4. 

^ + ^ + Eli [(» + i)2 --i] 

l + a;_i l + Z2 '-i-l 

_ [{I - l)2'+i + 2] + [2'+i - 2] 

= 271 = 4 

i2'-i 

We now show that algorithms Threshold is optimally competitive: 

Theorem 4.2. No deterministic online algorithm for the list p-batch problem 
can have competitiveness less than 4- 

Proof. We only give a sketch of the proof here. Details will be in the full paper. 
We prove that for any 5 > 0 there is no (4 — (j)-competitive algorithm. Fix 7 > 0 
and let fV be a very large integer such that I/7 <C To this end, we show that 
for any deterministic online algorithm A one can construct a request sequence 
such that the cost for A cannot be can be better than (4 — 5) times the cost of 
opt on that sequence. Define now — 0\, . . . ,Ol. Let denote a job with 
processing requirement kj. Then we construct a sequence of jobs C^,C^,C^, . . . 
punctuated by various r^. More precisely we define the sequence to be 

N n\ n2 nki ^(fel + 1) nk^ 

where is the first job in batch t'+l for the online algorithm and it is stipulated 
that the sequence may terminate after job for any £. 

Then opt can serve all jobs with two batches (one ending with and one 
ending with C^‘). Therefore to be (4 — i5)-competitive the following inequality 
must hold for all £ 

e 

N\£ + - 7)) + 0{N^-^) < (4 - + /,_i) + 

2=1 
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where for simplicity, define fi = kij. If 7 is chosen sufficiently small and N is 
chosen sufficiently large these inequalities require that 

i 

i=l 

hold for all £. Further the sequence fi,i = 1,2, .. . must be increasing by def- 
inition. A simple variational argument can show that these inequalities have a 
solution iff there is a solution with all inequalities tight. These resulting equalities 
can be solved using recurrences. We find that the unique solution yields values 
of fi which are not monotone increasing. We conclude that therefore there can 
be no (4 — (5)-competitive algorithm. We mention that if the multiplicative factor 
(4 — 5/2) is replaced by 4 then there is a solution which is monotone increasing. 
In fact, in this case, the values of the /jS are the a*s which define Threshold. 

5 Conclusion 

For the s-batch problem we showed tight bounds of 2. Both the upper bound 
and the lower bound follow from a ratio of two in the completion times of the 
algorithm compared to the optimal offline schedule, not only for the total cost, 
but for each job separately. Therefore those bounds hold for a larger class of 
goal functions, including weighted total flow time and t!p-norm of flow times. We 
note that our results for identical job sizes are obtained for the case that the job 
size equals the setup time. Those techniques can be easily applied to cases where 
the two values are different. We also studied the p-batch problem for which we 
proved tight bounds of 4. 
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Abstract. The relative worst order ratio is a new measure for the quality of on- 
line algorithms, which has been giving new separations and even new algorithms 
for a variety of problems. Here, we apply the relative worst order ratio to the seat 
reservation problem, the problem of assigning seats to passengers in a train. For 
the unit price problem, where all tickets have the same cost, we show that First-Fit 
and Best-Fit are better than Worst-Fit, even though they have not been separated 
using the competitive ratio. The same relative worst order ratio result holds for the 
proportional price problem, where the ticket price is proportional to the distance 
travelled. In contrast, no deterministic algorithm has a competitive ratio, or even 
a competitive ratio on accommodating sequences, which is bounded below by a 
constant. It is also shown that the worst order ratio for seat reservation algorithms 
is very closely related to the competitive ratio on accommodating sequences. 



1 Introduction 

The standard measure for the quality of on-line algorithms is the competitive ratio [14,23, 
17], which is, roughly speaking, the worst-case ratio, over all possible input sequences, 
of the on-line performance to the optimal off-line performance. In many cases, the com- 
petitive ratio is quite successful in predicting the performance of algorithms. However, 
in many others, it gives results that are either counter-intuitive or counter to the exper- 
imental data. There is therefore a need to develop performance measures that would 
supplement the competitive ratio. 

The competitive ratio resembles the approximation ratio, which is not surprising 
as on-line algorithms can be viewed as a special case of approximation algorithms. 
However, while it seems natural to compare an approximation algorithm to an optimal 
algorithm, which solves the same problem in unlimited time, it does not seem as natural 
to compare an on-line algorithm to an off-line optimal algorithm, which actually solves 
a different problem (an off-line version). Additionally, when there is need to compare 
two on-line algorithms against each other, it seems more appropriate to compare them 
directly, rather than involve an intermediate comparison to an optimal off-line algorithm. 

For this reason, a new performance measure for the quality of on-line algorithms 
has been developed [6]. This measure, the relative worst order ratio, allows on-line 
algorithms to be compared directly to each other. It combines the desirable properties of 
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some previously considered performance measures, namely the Max/Max ratio [5] and 
the random order ratio [18]. The Max/Max ratio allows direct comparison of two on- 
line algorithms, without the intermediate comparison to OPT. The random order ratio, 
on the other hand, is the worst-case ratio of the expected performance of an algorithm 
on a random permutation of an input sequence, compared with an optimal solution. 
To compare two algorithms using the relative worst order ratio, we consider a worst- 
case sequence and take the ratio of how the two algorithms do on their respective worst 
orderings of that sequence. Though intended for direct comparison of on-line algorithms, 
the relative worst order ratio may also be used to compare an on-line algorithm to the 
optimal off-line algorithm, in which case it more closely parallels the competitive ratio. 
We then refer to the ratio as simply the worst order ratio. 

The relative worst order ratio has already been applied to some problems and has led 
to more intuitively and/or experimentally correct results than the competitive ratio, as 
well as to new algorithms. For paging, in contrast to the competitive ratio, it has shown 
that Least-Recently-Used(LRU) is strictly better than Flush- When-Full(FWF) and that 
look-ahead helps [8], both results being consistent with intuition and practice. Addition- 
ally, although LRU is an optimal deterministic algorithm according to the competitive 
ratio, a new algorithm RLRU has been discovered, which not only has a better relative 
worst order ratio than LRU, but is experimentally better as well according to initial 
testing [8]. Other problems where the relative worst order ratio has given more correct 
results are bin packing [6,7], scheduling [12], and bin coloring [20]. 

Given these encouraging results, this paper will use the relative worst order ratio 
to analyze algorithms for the seat reservation problem. This problem is defined in [10] 
as the problem of assigning passengers to seats on a train with n seats and k stations 
en-route, in an on-line manner. We focus on deterministic algorithms, although random- 
ized algorithms for this problem have also been studied [10,3]. Three algorithms are 
studied: First-Fit, Best-Fit, and Worst-Fit. There are two variants of the seat reservation 
problem: the unit price problem and the proportional price problem. For both variants, 
the competitive ratio is 0(^) for all deterministic algorithms [10], and thus not bounded 
below by a constant independent of k (recall that for a maximization problem, a low 
competitive ratio implies a bad algorithm). No pair of algorithms has been conclusively 
separated using the competitive ratio. 

Using the relative worst order ratio, we are able to differentiate all three algorithms, 
for both the unit price and the proportional price problems. We show that for a category 
of algorithms called Any-Fit, which includes both First-Fit and Best-Fit, First-Fit is at 
least as good as any other algorithm. Moreover, First-Fit is strictly better than Best-Fit 
with a relative worst order ratio of at least | for the unit price problem and at least 
for the proportional price problem. We also show that Worst-Fit is at least as bad as any 
other deterministic algorithm, and is strictly worse than any Any-Fit algorithm by a ratio 
of at least 2 — for the unit price problem and exactly k—1 for the proportional price 
problem. 

Additionally, we find that, for the seat reservation problem, an algorithm’s worst or- 
der ratio is bounded from above by the competitive ratio on accommodating sequences' 

* The competitive ratio on accommodating sequences was first studied in [10], but called the 
accommodating ratio there. 
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(defined below) for the algorithm and bounded below by the competitive ratio on ac- 
commodating sequences for some algorithm. This gives bounds for the worst order ratio 
of ^ < r < ^ + 2 k+&n-(fi+ 2 c) ’ where c = fc — 1 (mod 6), for the unit price problem. 
This is a more useful estimate of how an algorithm performs than the competitive ratio, 
which is not bounded below by a constant. 



2 The Seat Reservation Problem 

The seat reservation problem [10] concerns a scenario where a train with n seats travels 
on a route passing through k >2 stations, including the first and the last. The seats are 
numbered from 1 to n. The start station is station 1 and the end station is station k. A 
customer may, any time prior to departure, request a ticket for travel between stations s 
and/, where 1 < s < f < A:. At that time, the customer is assigned a single seat number, 
which cannot be changed. It is the role of the algorithm (ticket agent) to determine which 
seat number to assign. The customer may be refused a ticket only in the case when there 
is no single seat which is empty for the duration of the request. An algorithm which 
obeys this rule is called /air, and all algorithms for this problem must be fair. 

The seat reservation problem is, by its very nature, an on-line problem. An algorithm 
attempts to maximize income, i.e., the total price of the tickets sold, so the performance 
of an algorithm depends on the ticket pricing policy. We consider two variants: In the 
unit price problem, the price of all tickets is the same. In the proportional price problem, 
the price of a ticket is directly proportional to the distance travelled. Some of the results 
we prove hold for any pricing policy where all tickets have positive cost; we refer to 
such results as holding “regardless of pricing policy.” 

The seat reservation problem can be viewed as an interval graph coloring problem 
[15], with the assignment of seat numbers corresponding to the assignment of colors. An 
optimal on-line algorithm for the standard interval graph coloring problem, which tries to 
minimize the number of colors used, instead of maximizing the number of intervals given 
colors, is presented in [19]. The off-line seat reservation problem without the fairness 
restriction is equivalent to the maximum fc-colorable subgraph problem for interval 
graphs, which is solvable in polynomial time [24]. Various other problems which can 
be viewed as variants of the seat reservation problem are optical routing with a limited 
number of wavelengths [1,4,13,22], call control [2], and interval scheduling [21]. 

Before continuing, we introduce some basic notation. We use the notation x = 
[xs, a;f) to denote an interval x from station Xs to station Xf, where 1 < x^ < Xf < k. 
We say an interval x is a subinterval of the interval y if < Xs and Xf < j/f. Since a 
request is just an interval, we will use the terms interchangibly, depending on what is 
more natural at the time. The length of an interval (request) x is simply Xs — Xf. The 
empty space containing x is the maximum length of a request which could be placed on 
that seat and which contains x as a subinterval. At any given time, we say that a seat is 
active if at least one request has been assigned to it, and inactive otherwise. 

We consider the following three algorithms: First-Fit is the algorithm which places a 
request on the first seat which is unoccupied for the length of the journey. Best-Fit places 
a request on a seat such that the empty space containing that request is minimized. We 
note that to fully define the algorithm we must also specify a tie-breaker, that is, what 
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happens when there is more than one such seat. However, since we would like to keep 
our results as widely applicable as possible, we will not assume any specihc tie-breaker 
in any of our proofs. Our results will thus hold for any choice of a tie-breaker for Best-Fit. 
In some cases, bounds could be tightened slightly with knowledge of the tie-breaker^. 
However, these improvements are minor, and do not change the meaning of the results. 
Worst-Fit places a request on a seat such that the empty space containing that request is 
maximized. Again, we assume that any tie-breaker may be chosen, and our results hold 
for all such choices. In this case, however, knowledge of the tie-breaker would not help 
tighten any of our bounds. Additionally, we consider the class of Any-Fit algorithms, 
inspired by a class of Bin Packing algorithms of the same name defined by Johnson in 
[16]. An Any-Fit algorithm places a request on an inactive seat seat only if it does not 
fit into any of the active seats. 



3 The (Relative) Worst Order Ratio 

In this section, we dehne the relative worst order ratio and the notion of two algorithms 
being comparable (Definition 2) as in [6], though, for the sake of simplicity, only for 
maximization problems, such as the seat reservation problem. 

Many algorithms are designed with certain kinds of permutations of the input in 
mind, making them very efficient for some permutations but very inefficient for others. 
Thus, given a set of requests, if we were to compare the performance of two algo- 
rithms directly to each other, we would get certain permutations where one algorithm 
strongly outperforms the other while the opposite would hold for other permutations, 
making the algorithms incomparable. Hence, we will consider sequences over the same 
set of requests together, and we will compare the performance of two algorithms on 
their respective worst-case permutations. To this end, we formally dehne Aw(/), the 
performance of an on-line algorithm A on the “worst permutation” of the sequence I 
of requests, as follows: 

Definition 1. Consider an on-line maximization problem P and let I be any request 
sequence of length n. If a is a permutation on n elements, then a{I) denotes I permuted 
by a. Let A be any algorithm for P. A(/) is the value of running A on I, and Aw(I) = 
miner A(cr(/)). 



Definition 2. Let Si (c) and 5*2 (c) be statements about algorithms A and B defined in 
the following way. 

S'i(c) : There exists a constant b such that Aw{I) < c • Bvr(T) + b for all I. 

82 ( 0 ) : There exists a constant b such that Ah'(/) > c • Biy(/) — bfor all I. 

The relative worst order ratio VFAa,b of on-line algorithm A to algorithm B is defined 
ifSiff) or 82 ( 1 ) holds. Inthiscase, A andM are said to be compavahle. IfSi{l) holds, 
then WRhfi, = sup {r | S' 2 (?’)}, and if S 2 {f) holds, then 1TAa,b = inf {r | 5'i(r)} . 

^ Specifically, the relative worst order ratio of First-Fit to Best-Fit can be slightly improved in 
Theorem 3 and in Theorem 6. 
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The statements S'! ( 1 ) and S '2 ( 1 ) check that one algorithm is always at least as good as 
the other on every sequence (on their respective worst permutations). When one of them 
holds, the relative worst order ratio is a bound on how much better the one algorithm can 
be. Note that if S'i(l) holds, the supremum involves S 2 rather than S'!, and vice versa. 

The constant b in the definitions of 5'i(c) and 5*2 (c) must be independent of the 
sequence I, and for the seat reservation problem, it must also be independent of k and 
n. A ratio of 1 means that the two algorithms perform identically with respect to this 
quality measure; the further away from 1, the greater the difference in performance. The 
ratio is greater than one if the first algorithm is better and less than one if the second 
algorithm is better. It is easily shown [6] that the relative worst order ratio is a transitive 
measure, i.e., for any three algorithms A, B, and C, WRa b < 1 and WRb,c < 1 implies 
WRa,c < 1- 

Although one of the goals in defining the relative worst order ratio was to avoid the 
intermediate comparison of any on-line algorithm, A, to the optimal off-line algorithm, 
OPT, it is still possible to compare on-line algorithms to OPT. In this case, the measure 
is called the worst order ratio [6], denoted WRa = WRa.opt- This ratio can be used 
to bound the relative worst order ratio between two algorithms and in some cases gives 
tight results. Thus, although it is generally most interesting to compare on-line algorithms 
directly to each other, the worst order ratio can also be useful in its own right. 

4 The Relation Between the Worst Order Ratio and the 
Competitive Ratio on Accommodating Sequences 

In this section, we show a connection between the worst order ratio and the competitive 
ratio on accommodating sequences [10], which is relevant to the seat reservation problem 
when the management has made a good guess as to how many seats are necessary for 
the expected number of passengers. A sequence for which all requests can be accepted 
within n seats is called an accommodating sequence. For a maximization problem, an 
algorithm A is c-competitive on accommodating sequences if, for every accommodating 
sequence /, A(/) > c • OPT(/) — b, where 6 is a fixed constant for the given problem, 
and, thus, independent of I. The competitive ratio on accommodating sequences for 
algorithm A is defined as 

sup{c I A is c-competitive on accommodating sequences}. 

The major result of this section shows that the worst order ratio for any memory- 
less, deterministic algorithm for the seat reservation problem, regardless of the pricing 
policy, is equal to its competitive ratio on accommodating sequences. An algorithm is 
memoryless if it never uses any information about anything but the current request and 
the current configuration (which requests have been placed where) in making a decision 
about the current request. A memoryless algorithm never uses information about the or- 
der the requests came in or about any of the rejected requests. All algorithms considered 
in this paper are memoryless. 

In the proof showing this connection, it is shown that there is a permutation of a 
particular subsequence which will force OPT to accept every item in that subsequence, 
using the following lemma: 
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Lemma 1. Any algorithm A for the seat reservation problem will accept all requests in 
any accommodating sequence, I, if the requests in I are in nondecreasing order by left 
endpoint. 

Proof Consider any request, r = [rs,r/), in the sequence, I. Since the sequence is 
accommodating, there are at most n requests containing the suhinterval Ts+i). Thus, 
when r occurs in the sequence, there is some seat which A has left empty from to 
Ts+i- Because of the the ordering of the requests, if the seat is empty from to Vg+i, 
it is also empty to the right of rg. Since any algorithm for the seat reservation problem 
is fair, the request will be accepted. Thus, the entire sequence will be accepted. □ 

Theorem 1. Let A be a deterministic algorithm for the seat reservation problem. If A 
is memoryless, then A ’s worst order ratio and its competitive ratio on accommodating 
sequences are equal, regardless of the pricing policy. Otherwise, A’s worst order ratio 
is no larger than its competitive ratio on accommodating sequences and at least the 
competitive ratio on accommodating sequences of some algorithm. 

Proof. First assume that WRa > c. Then, there exists a constant b such that Aw{I) > 
c • OPT vv(/) — 5 for all input sequences /. It follows from definitions that A(J) > Aw{I) 
and OPTw(T) = OPT(/) for all accommodating sequences I. Hence, there exists a 
constant b such that A(I) > c • OPT(/) — b for all accommodating sequences I, so A 
is c-competitive on accommodating sequences. Thus, the worst order ratio is at most as 
large as the competitive ratio on accommodating sequences. 

To prove the other direction, we consider an arbitrary input sequence I and a worst- 
case permutation of / for A, /a. Let /acc be the subsequence of /a containing all the 
requests in /a which are accepted by A. Order the requests in /acc in nondecreasing 
order by their left endpoints. Then, place this ordered sequence at the beginning of a new 
sequence, /opt, followed by the remaining requests remaining in /, giving a permutation 
of I. Notice that by the above lemma, OPT will be forced to accept all requests in /acc 
when given /opt- Let the subset of the requests it accepts from /opt be /'. In OPT’s worst 
permutation of /, OPT accepts at most |/'| requests. Clearly, /' is an accommodating 
sequence. If A is memoryless, then we can without loss of generality assume that the 
items it rejects from a sequence are at the end of that sequence. Thus if, in a permutation 
of /', the items in /acc are placed in the same relative order as in /a, followed by the 
remaining items from /', A will accept only those in /acc- If -^’s competitive ratio on 
accommodating sequences is c, then for some constant b, Aw{I') > c - |/'| — 6, so 
Aw (I) = |/acc| > c-\I'\ — b, and Aw (I) > c- OPTiy(/) — b. Since this holds for any 
request sequence I, WRa is at least A’s competitive ratio on accommodating sequences. 

If A is not memoryless, it is not obvious that there is an permutation of /' which 
would cause A to accept only /acc - However, there is clearly some on-line algorithm, B 
which would accept only /acc - Following the reasoning above, assuming B’s competitive 
ratio on accommodating sequences is c, Bw(L') > c - |/'| - b implies Aw{I) > c - 
OPTpi/ (/) — b. Thus, WRa is at least B’s competitive ratio on accommodating sequences. 

□ 

The theorem above, combined with results on the competitive ratio on accommo- 
dating sequences [3], immediately gives that for k much larger than n, the worst order 
ratio for any deterministic algorithm for the unit price problem is close to f . 
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Corollary 1. The worst order ratio for any deterministic algorithm for the unit price 
problem with n > 3 seats is at least | and most ^ + 2 fc+ 6 w^( 8 + 2 c) ’ k >7 and 

c = k — 1 (mod 6). 

This result is interesting in that it gives a much more optimistic prediction for the 
unit price problem than the competitive ratio, which is not bounded below by a constant. 
For the proportional price problem, the competitive ratio on accommodating sequences 
has not been shown to be different from the competitive ratio [10]. Thus, if we similarly 
try to extend the theorem above to the proportional price problem, we do not get any 
results that are different from the competitive ratio. 

The results above are also useful when considering the relative worst order ratio. The 
next corollary, which follows from Theorem 1 and the results from [10], gives bounds 
on the relative worst order ratios for the algorithms we consider. 

Corollary 2. For any two comparable deterministic algorithms A and B, 

- for the unit price problem, | < WRa,b < 2, and 

— for the proportional price problem < WRa. ,B < fc — 1 ■ 



5 The Unit Price Problem 

In this section, we will investigate the relative worst order ratios of deterministic al- 
gorithms for the unit price problem. Without loss of generality, we assume within the 
proofs that the price of all tickets is one unit of profit. The algorithms we consider 
make the same decisions regardless of the pricing policy used. Thus, we can make some 
conclusions about their relative performance for the proportional price problem while 
analyzing their relative performance for the unit price problem. 

5.1 First-Fit Is at Least as Good as Any Any-Fit Algorithm 

Our first result is based on the fact that given an input sequence and First-Fit’s arrange- 
ment of it, an Any-Fit algorithm can be forced to make the exact same seat arrangements 
by permuting the sequence in an appropriate way. 

Theorem 2. For any Any-Fit algorithm K, WRff,a > 1. regardless of pricing policy. 

Proof. We will consider an arbitrary input sequence I and its worst-case permutation for 
First-Fit, /pp. We will show that there exists a permutation of I, /a, such that A(/a) = 
FF(/pp). This will imply that FFw(/) = FF(/pp) = A(/a) > Aw(/). Since this will 
hold for all I, we will have proven the theorem. 

Without loss of generality, we will assume that all requests which are rejected by 
First-Fit appear last in /pp and that when A must choose a new seat to activate, it will 
choose the seat with the smallest number. 

Let the height of a request in /pp be the seat it was assigned to by First-Fit, and oo if 
it was rejected by First-Fit. Let /a be a permutation of / where all the requests appear 
in order of non-decreasing height. We prove that A(/a) = FF(/pp) by induction. The 
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induction hypothesis is that after processing all requests with height up to and including 
i, A will make the same seat assignments as First-Fit. For the base case i = 0, no seats 
have been assigned, so the inductive hypothesis holds trivially. 

For the general case of 1 < i < n, we consider when A encounters the first request 
with height i. At this point, A has filled the first i — 1 seats exactly as First-Fit, and seats 
i . . .n remain inactive. Since this request could not be fit into any of the first i — 1 seats 
by First-Fit, it cannot be fit into any of the first i — 1 seats by A. It will therefore be 
placed in the first available inactive seat, which is seat i. 

Now consider when A encounters any other request r with height i. At this point, A 
has filled the first i—1 seats with at least the same requests as First-Fit, and now it has 
activated other seats as well. Seat i is now active. Again, r cannot fit into any of the first 
i—1 seats. Moreover, since the only possible requests to be placed on seat i at this point 
must have height i and all requests with the same height must be non-overlapping, A 
can fit r in seat i. Since A is an Any-Fit algorithm, it will necessarily assign r to seat i. 

For the case of z = oo, A is not able to accommodate these requests because if it 
would then First-Fit would have accommodated them as well. Therefore, A will reject 
these requests. □ 

This theorem alone does not separate First-Fit from Best-Fit, but the following the- 
orem gives us a family of input sequences for which First-Fit will out-perform Best-Fit. 

Theorem 3. For the unit price problem with /c > 10, | < WRff,bf < 2. 

Proof. The upper bound follows directly from Corollary 2. Since Theorem 2 shows that 
WRff.bf > 1, it is sufficient to find a family of sequences /„ with lim„_>oo FFw(fn) = 

00, where there exists a constant b such that for all FFw(/n) > |BFw(Fn) — b. 
Consider the sequence /„ beginning with request tuples [1, 2), [5, /c — 4), [fc — 

1, k), followed by request tuples [3, k — 2), [2, 3), [k — 2,k — 1). We then end 

the sequence with request tuples [1, 3), [k — 2, k). Clearly, even in the worst-case 
ordering, First-Fit will accommodate all requests, so FFw(/n) = 8 • . Best-Fit, on the 

other hand, will accommodate at most two of the last [ ^ J tuples given this ordering (when 

nisodd), soBFw(Tn) < 6- [|J +2. The result follows: FFw(Tn) > |BFw(Fn)~|- C 

It remains an open problem to close the gap between | and 2, though the relative 
performance of First-Fit to Best-Fit is established. 

5.2 Worst-Fit Is at Least as Bad as Any Deterministic Algorithm 

Worst-Fit spreads out the requests, creating many short empty intervals, instead of fewer, 
but longer, empty intervals, as with Best-Fit. The following theorem shows that this 
strategy is not very successful. 

Theorem 4. For any deterministic algorithm A, WPa,wf > 1. regardless of pricing 
policy. 

Proof. We will consider an arbitrary input sequence I and its worst-case permutation 
for A, I A- We will show that there exists a permutation of /, /wf, for which Worst-Fit 
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will reject at least all the elements that A rejected. This will imply Aw(/) = A(Ja) > 
WF(/\vf) > WFw(/)- Since this will hold for all /, we will have proven the theorem. 

We construct /wf by ordering all the requests A accepted in nondecreasing order of 
their start station, followed by all the rejected requests in arbitrary order. Let r be any 
request rejected by A. Consider the set of requests S = {si, S 2 , . . . , s„}, which are the 
first n elements in /wf which overlap r. Such a set must exist since r was rejected by 
A. We claim that no two requests from S will be placed in the same seat by Worst-Fit. 
If the claim holds, then it will imply that r is rejected by Worst-Fit. 

We prove the claim by contradiction. Suppose there exist two requests, x,y G S 
such that Worst-Fit places them in the same seat. Without loss of generality, we assume 
Worst-Fit processes x before y. Since requests appear in nondecreasing order of their 
start station in /wf, vve have that y lies to the right of x. Now consider the point in time 
when Worst-Fit processes y. Since S contains the first n requests in /wf overlapping 
r, and Worst-Fit has not processed all of them yet, there must be a seat for which the 
interval r is still empty. Furthermore, since Worst-Fit hasn’t yet processed any requests 
that lie completely to the right of r, there exists a free interval on this seat of length 
s > k — Ts into which Worst-Fit could place y. On the other hand, the free interval on 
the seat of x has length s' < k — Xf. Since s > s', Worst-Fit would not place y on the 
same seat as x, and therefore we have reached a contradiction. □ 

Additionally, we can prove an asymptotically tight bound for the relative worst order 
ratio of Worst-Fit to both First-Fit and Best-Fit, which is as bad as Worst-Fit can be with 
respect to any algorithm. The following proof uses a family of sequences, first used in 
[9], which can be intuitively seen to cause Worst-Fit to perform very poorly. This idea 
is formalized with respect to the relative worst order ratio in the following theorem. 

Theorem 5. For any Any -Fit algorithm A for the unit price problem 

2 — ^ — - < WRa^wf < 2. 

Proof. The upper bound follows directly from Corollary 2. Since Theorem 4 implies 
that WRa^wf > 1, fo prove the lower bound, it is sufficient to find a family of sequences 
In with lim„_>oo ^w{In) = oo, where there exists a constant b such that for all /„, 
Aw(/„) > (2 - ^)WFw(/„) - b. 

We construct /„ as follows. We begin the request sequence with jrrfJ requests for 
each of the intervals [1, 2), [2, 3), . . . , [fc — 1, k). In the case when n is not divisible by 
fc—l,wealsogiveone additional requestfor each ofthe intervals [1, 2), . . . , [(n mod k— 
1), (n mod fc — 1) -I- 1). If n is divisible by fc — 1, then these requests are omitted. Then 
we finish the sequence with n — requests for the interval [1, k). Regardless of 

the ordering, A will accommodate all requests, so that Aw(/n) = 2n — . For 

Worst-Fit, the given ordering is the worst case ordering, and it will fill all the available 
seats with the first n requests, while rejecting all the remaining requests. Therefore, 
WF\v(/n) = n. This gives us the needed ratio: Aw(/„) > (2 - ^)WFw(/„) - 1. □ 
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Corollary 3. 2 — ^ 2 and 2 — — ^Rbf.wf ^ 2. 

Thus, we obtain a clear separation between Worst-Fit and First-Fit/Best-Fit, and the 
bounds on the ratio are asymptotically tight. 

6 The Proportional Price Problem 

For the proportional price problem, the ticket price is proportional to the distance trav- 
elled. Without loss of generality, we will assume in the proofs that the price of a ticket 
from station i to station j is j — i. It turns out that many of the results for the unit price 
problem can be transfered to the proportional price problem. Specifically, we still have 
the result that First-Fit is at least as good as any Any-Fit algorithm, and Worst-Fit is 
at least as bad as any deterministic algorithm. One difference is that the value of the 
relative worst order ratio of First-Fit to Best-Fit is different, as we show in the following 
theorem. 

Theorem 6. For the proportional price problem with fc > 6, ^ < WRff i^p h — 1. 

Proof. The upper bound follows directly from Corollary 2. Since Theorem 2 shows that 
WRff.bf > 1, it is sufficient to find a family of sequences /„ with lim„_>oo FFw(.i^n) = 
oo, such that for all FFw(Fn) > ^^BFw(/n)- 

We define fhe family of sequences J„ only for even n. Consider this sequence begin- 
ning with ^ request tuples [1, 2), [k— 1, k), followed by | request tuples [k — 3, k) and 
[2, 3). Finally, the sequence concludes with | requests tuples [1, fc — 3). First-Fit will be 
able to place all the requests regardless of their ordering, so FFw(/n) = (A: -I- 2) • 

On the other hand, Best-Fit will not accommodate any of the last ^ requests when given 
the ordering above, so BFw(Fn) = 6 • (^). The needed ratio follows. □ 

Unlike for the unit price problem, the relative worst order ratio of First-Fit to Best-Fit 
is not bounded by a constant independent of k. Moreover, the gap between the lower 
bound and the upper bound increases as k goes to infinity, meaning that the bounds are 
not asymptotically tight. It would be interesting to see if they can be tightened to be so. 

The second difference between the proportional and unit price problem is the rela- 
tive worst order ratio of Worst-Fit to any Any-Fit algorithm. Specifically, we have the 
following theorem. 

Theorem 7. For any Any-Fit algorithm A for the proportional price problem, 

= fc — 1. 

Proof. The upper bound follows directly from Corollary 2. Since Theorem 4 shows that 
WRa,wf > l,itis sufficienttofindafamily of sequences /„ withlim„_>oo Aw(/n) = oo, 

such that for all Aw(/n) > {k — l)WFw(/n). 

We will use the same sequence as was used in the proof of Theorem 5, except that we 
will define if only for n divisible by fc — 1. The algorithms will still accept and reject the 
same requests, but the profit must be calculated differently. WFw(Fn) = n still holds, 
but now Aw(Fn) = n - {k — 1). The resulting ratio for the lower bound follows. □ 
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Thus, the ratio of Worst-Fit to any Any-Fit algorithm is exact, and is as bad as can be. 
We note that in the above proof we consider the same ordering of the sequence for both 
Worst-Fit and A, and A behaves exactly as OPT. This means we can also use the same 
sequence to prove that the competitive ratio for Worst-Fit is the worst possible among 
deterministic algorithms. 

7 Concluding Remarks and Open Problems 

The relative worst order ratio has already been applied to some problems, and has led 
to intuitively and/or experimentally correct results which could not be obtained with 
the competitive ratio [6,8,12,20]. For the seat reservation problem, applying the relative 
worst order ratio has proven very helpful in differentiating between various deterministic 
algorithms that could not be differentiated with the competitive ratio. Moreover, previous 
work studying the seat reservation problem with respect to the competitive ratio and the 
competitive ratio on accommodating sequences has essentially ignored the proportional 
price problem, since all the results have been so negative. In contrast, the relative worst 
order ratio allows us to easily compare algorithms for the proportional price problem. 

It remains interesting to see if the assumption that A is memoryless is necessary 
in Theorem 1. As is, Theorem 1 is interesting in that it gives a relationship between 
the relative worst order ratio and the competitive ratio on accommodating sequences. 
The direction showing that the worst order ratio for an algorithm A is no larger than 
its competitive ratio on accommodating sequences clearly applies to any maximization 
problem (and the opposite inequality for any minimization problem). However, the other 
direction does not hold for all problems. For dual bin packing, a problem where most 
results have resembled those for the unit price seat reservation problem, WRa = 0 for 
any fair, deterministic algorithm A [6], although the competitive ratio on accommodating 
sequences is always at least | [11]. 

With respect to the algorithms described in this paper, the most interesting open 
problem is to close the gap between | and 2 for the ratio of Fist-Fit to Best-Fit. Ultimately, 
the goal is to hnd an algorithm that is better than the existing ones, as has been done for 
the paging problem [8]. In this sense, the most interesting open problem remains to find 
an algorithm that does better than First-Fit, or show that one does not exist. 
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Abstract. The standard dynamic programming solution to finding k- 
medians on a line with n nodes requires 0{kn^) time. Dynamic pro- 
gramming speed-up techniques, e.g., use of the quadrangle inequality or 
properties of totally monotone matrices, can reduce this to 0{kn) time 
but these techniques are inherently static. The major result of this paper 
is to show that we can maintain the dynamic programming speedup in 
an online setting where points are added from left to right on a line. 
Computing the new fc-medians after adding a new point takes only 0(k) 
amortized time and O(felogn) worst case time (simultaneously). Using 
similar techniques, we can also solve the online k-coverage with uniform 
coverage on a line problem with the same time bounds. 



1 Introduction 

In the k-median problem we are given a graph G = (U, E) with nonnegative edge 
costs. We want to choose k nodes (the medians) from V so as to minimize the 
sum of the distances between each node and its closest median. As motivation, 
the nodes can be thought of as customers, the medians as service centers, and 
the distance between a customer and a service center as the cost of servicing the 
customer from that center. In this view, the fc-median problem is about choosing 
a set of k service centers that minimizes the total cost of servicing all customers. 

The fc-median problem is often extended so that each customer (node) has a 
weight, corresponding to the amount of service requested. The distance between 
a customer and its closest service center (median) then becomes the cost of 
providing one unit of service, i.e., the cost of servicing a customer will then 
be the weight of the customer node times its distance from the closest service 
center. Another extension of the problem is to assign a start-up cost to each node 
representing the cost of building a service center at that node. The total cost 
we wish to minimize is then the sum of the start-up costs of the chosen medians 
plus the cost of servicing each of the customer requests. This is known as the 
facility location problem. 

* This work partially supported by Hong Kong RGC grants HKUST6010/01E, 
HKUST6162/00E, HKUST6082/01E and HKUST6206/02. The authors would like 
to thank Gerhard Trippen for his help in proofreading and latexing the figures. 
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The fc-Median on a Line Problem (fcML) 

Let k > 0. Let xi < X2 <■■■< Xn he points on the real line. With each 
point Xj there are associated a weight Wj > 0 and a start-up cost Cj > 0. A 
k-placement is a subset S C Vm = {xi, . . . , Xm} of size IS] at most k. We 
define the distance of point Xj to S by dj{S) — min^gs \xj — y\. The cost of 
S is (i) the cost of creating the service centers in S plus (ii) the cost of 
servicing all of the requests from S: 

n 

cost{S) = Ci + Wjdj{S) . 

XiGS 1 

The k-median on a line problem (kWL) is to find a fc-placement S minimizing 
cost{S). In online fcML, the points are given to us in the order xi,X2, ■ ■ ■, and 
we have to compute optimal solutions for the known points at any time. 



Fig. 1. The fc-median on a line problem. 



Lin and Vitter [7] proved that, in general, even finding an approximate solu- 
tion to the /c-median problem is NP-hard. They were able to show, though, that 
it is possible in polynomial time to achieve a cost within 0(1 -b e) of optimal if 
one is allowed to use (1 -I- l/e)(lnn-|- l)k medians. The problem remains hard if 
restricted to metric spaces. Guha and Khuller [5] proved that this problem is still 
MAX-SNP hard. Charikar, Guha, Tardos and Shmoys [4] showed that constant- 
factor approximations can be computed for any metric space. In the specific case 
of points in Euclidean space, Arora, Raghavan, and Rao [2] developed a PTAS. 

There are some special graph topologies for which fast polynomial time al- 
gorithms exist, though. In particular, this is true for trees [8,10] and lines [6]. In 
this paper we will concentrate on the line case, in which all of the nodes lie on 
the real line and the distance between any two nodes is the Euclidean distance. 
See Fig. 1 for the exact definition of the fc-median on a line problem (fcML). 

There is a straightforward O(fcn^) dynamic programming (DP) algorithm 
for solving fcML. It fills in 0{kn) entries in a dynamic programming table^ where 
calculating each entry requires minimizing over 0(n) values, so the entire algo- 
rithm needs 0{kn'^) time. Hassin and Tamir [6] showed that this DP formulation 
possesses a quadrangle or concavity property. Thus, the time to calculate the 
table entries can be reduced by an order of magnitude to 0{kn) using known 
DP speed-up techniques, such as those found in [9]. 

In this paper we study online fcML. Since static fcML can be solved in 0{kn) 
time our hope would be to be able to add new points in 0{k) time. The difficulty 
here is that Hassin and Tamir’s approach cannot be made online because most 
DP speed-up techniques such as in [9] are inherently static. The best that can 



^ We do not give the details here because the DP formulation is very similar to the 
one shown in Lemma 1. 
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The fc-Coverage on a Line Problem (fcCL) 

In addition to the requirements of fcML, each node xj is also given a coverage 
radius rj. It is eovered by a fc-placement S if dj{S) < rj. In that case, the 
service cost for Xj is zero. Otherwise, the service cost is Wj. The cost of S is 
then 

n 

cost{S) = X! + X! ’ 

XiGS j — 1 

where Ij{S) = 0 if dj{S) < rj and Ij{S) = 1 if dj{S) > rj. The k-coverage 
on a line problem (kCL) is to find a fc-placement S minimizing cost{S). Online 
fcCL is defined similarly to online fcML. 



Fig. 2. The fc-coverage on a line problem. 



be done using their approach is to totally recompute the dynamic programming 
matrix entries from scratch at each step using O(fcn) time per step^. 

Later, Auletta, Parente and Persiano [3] studied fcML in the special case of 
unit lengths, i.e., Xi+i = -I- 1 for all i, and no start up costs, i.e., Ci = 0 for 

all i. Being unaware of Hassin and Tamir’s results they developed a new online 
technique for solving the problem which enabled them to add a new point in 
amortized 0{k) time, leading to an 0{kn) time algorithm for the static problem. 

The major contribution of this paper is to bootstrap off of Auletta, Parente 
and Persiano’s result to solve online fcML when (i) the points can have arbitrary 
distances between them and (ii) start up costs are allowed. In Section 2 we prove 
the following theorem. 

Theorem 1. We can solve the online k-median on a line problem in 0{k) amor- 
tized and O(fclogn) worst case time per update. These time hounds hold simul- 
taneously. □ 

A variant of fcML is the k-coverage problem (kCL) where the cost of servicing 
customer Xj is zero if it is closer than rj to a service center, or Wj otherwise. See 
Fig. 2 for the exact definition of fcCL. 

Hassin and Tamir [6] showed how to solve static fcCL in O(n^) time (indepen- 
dent of fc), again using the quadrangle inequality/concavity property. In Section 
3 we restrict ourselves to the special case of uniform coverage, i.e., there is some 
r > 0 such that rj = r for all j. In this situation we can use a similar (albeit 
much simpler) approach as in Section 2 to maintain optimal partial solutions 
S as points are added to the right of the line. In Section 3 we will develop the 
following theorem. 

^ Although not stated in [6] it is also possible to reformulate their DP formulation in 
terms of finding row-minima in fc n x n totally monotone matrices and then use the 
SMAWK algorithm [1] — which finds the row-minima of an n x n totally monotone 
matrix in 0(n) time — to find another O(kn) solution. This was done explicitly in 
[11]. Unfortunately, the SMAWK algorithm is also inherently static, so this approach 
also can not be extended to solve the online problem. 
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Theorem 2. We can solve the online k-coverage on a line problem with uniform 
coverage in 0{k) amortized and O(fclogn) worst case time per update. These 
time hounds hold simultaneously. □ 

2 The fc-Median Problem 

2.1 Notations and Preliminary Facts 

In the online /c-median problem, we start with an empty line and, at each step, 
append a new node to the right of all of the previous nodes. So, at step m we 
will have m points Xi < X 2 < ■ ■ ■ < Xm and when adding the (m + l)st point 
we have Xm < Xm+i. Each node Xj will have a weight wj, and a start-up cost 
Cj associated with it. At step m, the task is to pick a set S of at most k nodes 
from xi,X 2 , ■ . ■ , Xm that minimizes cost{S) = 

Our algorithm actually keeps track of 2k median placements for every step. 
The first k placements will be optimal placements for exactly i = 1, . . . , fc re- 
sources, i.e., let 



OPT,{m) 



min 

scvy,, |s|=i 



E 

L Xi^S 



Ci + 



J2wjdj{S) 



The remaining k placements are pseudo -optimal placements with the additional 
constraint that Xm must be one of the chosen resources. That is, for i = 1, . . . , fc 



POPT,{m) = 



scv„ 



min 

\S\—i,Xn 



,€S 



\Xi^S 






Vjdj{S) 



In particular, if i = 1, then S = {xm} and POPTi{m) = Cmd-XjLi^ Wj{xm—Xj). 
Optimal and pseudo-optimal placements are related by the following straight- 
forward equations. 

Lemma 1. 



OPTi{m)= min POPTi{j) ^ wi-d{j,l) 

l<j<m ^ 



and 



1=3 + 1 

m—1 



POPTi{m)= min \OPTi_i{j)+ ^ wi ■ d{l,m) \ Cm , 

l< 3 <m-l \ / 



( 1 ) 

(2) 



where d{j, 1) = xi — Xj is the distance between Xj and xi. □ 

Denote by MINi(m) the index j at which the “min” operation in Eq. (1) 
achieves its minimum value and by PMINi(m) the index j at which the “min” 
operation in Eq. (2) achieves its minimum value. When computing the OPTi(m) 
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and POPTi{m) values the algorithm will also compute and keep the MINi{m) 
and PMINi{m) indices. 

The optimum cost we want to find is OPT = imni<i<k{OPTi(n)). It is not 
difficult to see that, knowing all values of OPTi{m), MINi{m), POPTi{m) and 
PMINi(m) for 1 < i < k, 1 < m < n, we can unroll the equations in Lemma 1 
in 0{k) time to find the optimal set S of at most k medians that yields OPT. 
So, maintaining these ‘ink variables suffices to solve the problem. 

A straightforward calculation of the minimizations in Lemma 1 permits cal- 
culating the value of POPTi{m) from those of OPTi-i{j) in 0{m) time and the 
value of OPTi{m) from those of POPTi{j) in 0(m) time. This permits a dynamic 
programming algorithm that calculates all of the OPTi{m) and POPTi{m) val- 
ues in O (fcX)m=i = 0{kn^) time, solving the problem. 

As discussed in the previous section, this is very slow. The rest of this section 
is devoted to improving this by an order of magnitude; developing an algorithm 
that, at step m for each i, will calculate the value of POPTi{m) from those 
of OPTi_i{m) and the value of OPTi{m) from those of POPTi{m) in 0(1) 
amortized time and O(logn) worst case time. 



2.2 The Functions Vi{j,m,x) and V.f{j,m,x) 

As mentioned, our algorithm is actually an extension of the algorithm in [3] . In 
that paper, the authors defined two sets of functions which played important 
roles. We start by rewriting those functions using a slightly different notation 
which makes it easier to generalize their use. For all 1 < i < k and 1 < j < m 
define 

m 

Vi{j,m,x) = POPTi{j) + ^ wi ■ d{j,l) + X ■ d{j,m) . (3) 

1=3 + 1 

For all 1 < i < fc and 1 < j < to — 1 define 

m—1 m— 1 

V-{j, TO, a;) = OPTi_i(j) + '^ wi ■ d{l, m) + x ■ wi . (4) 

1=3 + 1 1=3 + 1 



Then Lemma 1 can be written as OPTi{m) = mini<j<m to, 0) and 
POPTi{m) = mini<j<m_i V({j, to, 0) -I- Cm- 

The major first point of departure between this section and [3] is the following 
lemma, which basically says that Vi{j, to, x) and V({j, to, x) can be computed in 
constant time when needed. 

Lemma 2. Suppose we are given W{m) = X)™ i M{m) = Xti ' 

c?(l,^). Then, given the values of POPTi(j), the function Vi{j,m,x) can he eval- 
uated at any x in constant time. Similarly, given the values of OPTi-i(j), the 
function Vf{j,m,x) can be evaluated at any x in constant time. 

Proof. We first examine Vi{j,m,x). We already know POPTi(j) so we only 
need to compute the terms ‘ + x ■ d{j,m). We can compute 
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■ d{j,l) = [M{m) — M{j)] — [W{m) — W{j)\ ■ d{l,j) in constant time. 
For V^{j,m,x), we also only need to compute 

But we can compute X)™ j+i ’ d{l, m) = [LF(m — 1) -W{j)]-d{l, m) — [M{m — 
1) — M{j)] and X/H^+i wi = W{m — 1) — W{j) in constant time. □ 

In the next two subsections we will see how to use this lemma to efficiently 
calculate POPTi{j) and OPTi{j). 



2.3 Computing OPTi{m) 

We start by explaining how to maintain the values of OPTi{m). Our algorithm 
uses k similar data structures to keep track of the k sets of OPTi (m) values, for 

1 < z < A:. Since these k structures are essentially the same we will fix i and 
tVi 

consider how the data structure permits the computation of the values of 
OPTi{m) as m increases. 



The Data Structures. Recall Eq. (3). Consider the m functions Vi{j,m,x) 
for 1 < j < TO. They are all linear functions in x so the lower envelope of these 
functions is a piecewise linear function to which each Vi{j,m,x) contributes at 
most one segment. 

We are only interested in OPTi{m) = mini<j<m to, 0) which is equiv- 
alent to evaluating this lower envelope at a: = 0. In order to update the data 
structure efficiently, though, we will see that we will need to store the entire lower 
envelope for cc > 0. We store the envelope by storing the changes in the enve- 
lope. More specifically, our data structures for computing the values of OPTi{m) 
consist of two arrays 



Z\i(TO) = and Zi{m) = {zi, . . . , z.s), (5) 

such that 

if (5/1-1 <x + W{m) < Sh, then Vi{zh,m,x) = minj<m Vi{j,m,x) . (6) 

The reasons for the shift term W{m) = Xti become clear later. Since 

we only keep the lower envelope for a: > 0, we have (5 q < W{m) < (5i. 

An important observation is that the slope of V {j, to, x) is d{j, to) which 
decreases as j increases, so we have z\ < ■ ■ ■ < Zg and = to at step to. 
In particular, note that V{m,m,x), which is the rightmost part of the lower 
envelope, has slope 0 = d{m, to) and is a horizontal line. 

Given such a data structure, computing the value of OPTi(m) becomes triv- 
ial. We simply have MINi{m) = Z\ and OPTi{m) = Vi(zi,m,0). 

Updatiug the Data Structures. Assume that the data structure given by 
Eq. (5) and (6) is storing the lower envelope after step m and, in step to-|- 1, point 
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Xm+i is added. We now need to recompute the lower envelope of Vi{j, m+l,x), 
for 1 < j < TO + 1 and a; > 0. Note that in step to we have to functions 

{V^{j,m,x) : 1 < j < to} 
but we now have to + 1 functions 

{Vt{j,m+ l,x) : l<j<TO + l|. 

If we only consider the lower envelope of the first to functions to + 1, a;) 
for 1 < j < TO, then the following lemma guarantees that the two arrays Ai(m) 
and Zi{m) do not change. 

Lemma 3. Assume Vi{zh,m,x) minimizes Vi{j,m,x) for 1 < j < to when 
Sh-i < a; + W{m) < Sh- Then Vi{zh,m + l,a;) minimizes Vi{j,m + l,a;) for 
1 < J < TO when 5h-i < a; + W(to + 1) < 5h- 

Proof. It is easy to verify that for I < j < to 

V^{j, m+l,x) = Vi{j, m,x + Wm+i) + (a; + Wm+i) ■ d{m, to + I) . 

Since 6h-i < x + W (to+ I) < 6h iff Sh-i < (x + Wm+i) + W (to) < 6h, the above 
formula is minimized when j = Zh- □ 

This lemma is the reason for defining Eq. (5)and (6) as we did with the shift 
term instead of simply keeping the breakpoints of the lower envelope in Ai (to) . 

Note that the lemma does not say that the lower envelope of the functions 
remains the same (this could not be true since all of the functions have been 
changed). What the lemma does say is that the structure of the breakpoints of 
the lower envelope is the same after the given shift. 

Now, we consider Vi{m+1, m+ 1, x). As discussed in the previous subsection, 
Vi(m + 1,TO + l,x) is the rightmost segment of the lower envelope and is a 
horizontal line. So, we only need to find the intersection point between the lower 
envelope of Vi(j, to+1, x) for 1 < j < to and the horizontal line y = Vi{m+1, m+ 
l,x). Assume they intersect at the segment Vi{Zmax,nT-+ Then, Zi(m + 1) 
becomes (zi, . . . , Zmax, xn + 1), and Ai{m + 1) changes correspondingly. 

We can find this point of intersection either by using a binary search or a 
sequential search. The binary search would require O(logTO) worst case compar- 
isons between y = Vi(TO -I- 1 ,to -I- l,x) and the lower envelope. The sequential 
search would scan the array Zi(m) from right to left, i.e. from Zg to Zi, dis- 
carding segments from the lower envelope until we find the intersection point of 
y = Vi(m -|- 1, TO -|- 1, x) with points on the lower envelope. The sequential search 
might take 0(m) time in the worst case but only uses 0(1) in the amortized 
case since lines thrown off the lower envelope will never be considered again in 
a later step. 

In both methods a comparison operation requires being able to compare the 
constant ^^(to-I-I, to-|- 1, x) to Vi{j,m+l,x) for some j and some arbitrary value 
TO. Recall from Lemma 2 that we can evaluate Vi{j,m + l,x) at any particular 
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X in constant time. Thus, the total time required to update the lower envelope 
is O(logm) worst case and 0(1) amortized. 

To combine the two bounds we perform the sequential and binary search 
alternately, i.e. , we use sequential search in odd numbered comparisons and 
binary search in even numbered comparisons. The combined search finishes when 
the intersection value is first found. Thus, the running time is proportional to 
the one that finishes first and we achieve both the 0(1) amortized time and the 
O(logm) worst case time. 

Since we only keep the lower envelope for x > 0, we also need to remove 
from Zi(m+ 1) and Ai(m+ 1) the segments corresponding to negative x values. 
Set Zmin = max{z/j : Sh-i < W{m + 1) < Sh}- Then Zi{m + 1) should be 
{zmin, ■ • ■ , Zmax, ru + 1), and Ai{m + 1) should change correspondingly. 

To find Zmin, we also use the technique of combining sequential search and 
binary search. In the sequential search, we scan from left to right, i.e., from z\ to 
Zs- The combined search also requires 0(1) amortized time and O(logm) worst 
case time. 



2.4 Computing POPTi{m) 

In the previous section we showed how to update the values of OPTi{m) by 
maintaining a data structure that stores the lower envelope of Vi{j,m,x) and 
evaluating the lower envelope at x = 0, i.e., OPTi{m) = mini<j<m Vi{j, m, 0). In 
this section we will show how, in a very similar fashion, we can update the values 
of POPTi{m) by maintaining a data structure that stores the lower envelope of 
V-{j,m,x). Note that 

POPTi{m) = Cm + min V'{j, m, 0), 

1 

i.e., evaluating the lower envelope at x = 0 and adding Cm- 

As before we will be able to maintain the lower envelope of y/(j, m, x), 1 < 
j < m — 1, in 0(1) amortized time and O(logm) worst case time. The data 
structure is almost the same as the one for maintaining Vi{j, m, x) in the previous 
section so we only quickly sketch the ideas. 

As before the algorithm uses k similar data structures to keep track of the 
k lower envelopes; for our analysis we fix i and consider the data structures 
for maintaining the lower envelope of V^{j,m,x) (and thus POPTi(m)) as m 
increases. 



The Data Structures. By their definitions the m — 1 functions 

for 1 < j < m — 1, are all linear functions, so their lower envelope is a piecewise 

linear function to which each Vi{j,m,x) contributes at most one segment. 

As before, in order to compute the values of POPTi{m), we only need to 
know the value of the lower envelope at x = 0 but, in order to update the 
structure efficiently, we will need to store the entire lower envelope. 
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Our data structures for computing the values of POPTiim) consist of two 
arrays 

Z\'(m) = and Z[{m) = {z[, . . . (7) 

such that 

if <5^-1 < a; + d(l, m) < <5^, then m, x) = ■ (8) 

Since we only keep the lower envelope for a; > 0, we have Sq < d{l,m) < <5^. 
Since the slopes of V{{j,m,x) decrease when j increases, we have 

z'l < ■ ■ ■ < Zg and z' = m— 1 at step m. In particular, note that V'(jn— l,m,x), 
the rightmost part of the lower envelope, has slope 0 and is a horizontal line. 

Given such data structures, computing the value of POPTi (m) becomes triv- 
ial. We simply have PMINi{m) = z[ and POPTi{m) = Cm + w, 0)- 

Updating the Data Structures. Given the lower envelope of V^{j,m,x), for 
1 < j < m — 1 at step m we need to be able to recompute the lower envelope of 
Vi'{j,m + l,x), for 1 < j < m after Xm+i is added. 

As before, we will first deal with the functions U/(j, m-l-1, x) for 1 < j < m— 1, 
and then later add the function U/(to, m+ 1,x). 

If we only consider the functions V^{j, m + l,x) for 1 < j < m — 1, we have 
an analogue of Lemma 3 for this case that guarantees that the two arrays A' (m) 
and Z[{m) do not change. 

Lemma 4. Assume V^{z'f^, m, x) minimizes V[{j, m, x) for 1 < j < m — 1 when 
^'h-i < a; -I- d(l, m) < (5^. Then Vf {z'j^,m + l,x) minimizes V({j,m+ l,a;) for 
1 < j < m — 1 when < a; -I- d(l, m + 1) < 6'y^. 

Since the proof is almost exactly the same as that of Lemma 3 we do not, in this 
extended abstract, provide further details. 

We note that, using exactly the same techniques as in the comments following 
Lemma 3, we can update the lower envelope of Vf{j,m,x) for 1 < j < m — 
1 to the lower envelope of Vf{j,m -I- l,a;) for 1 < j < m using a combined 
binary /sequential search that takes both 0(1) amortized and O(logm) worst 
case time per step (simultaneously). 

2.5 The Algorithm 

Given the data structures developed in the previous section the algorithm is very 
straightforward. After nodes xi < X 2 < ■ ■ ■ < Xm have been processed in step m 
the algorithm maintains 

— W{m) = and M(m) = 

— For 1 < i < k , the data structures described in Sections 2.3 and 2.4 for 
storing the lower envelopes minj<m Vi{j, m, x) and minj<m-i U/(j, m, x). 

— For 1 < i < k and 1 < j < m, all of the values OPTi{j), POPTi{j) and 
corresponding indices MINi{j), PMINi{j). 
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Table 1. The values of OPTi{m) / MINiirn) in the upper rows and POPTi{m)/ 
PMINi(m) in the lower rows. 





m 


1 


2 


3 


4 


5 


6 


7 


8 


9 


i 


1 


5400/1 


2170/2 


2264/2 


691/4 


761/4 


785/4 


1955/4 


5241/4 


6337/5 


5400/- 


2170/- 


3322/- 


691/- 


939/- 


11048/- 


18362/- 


22093/- 


32721/- 


2 


-/- 


7500/2 


5270/3 


2364/4 


691/5 


699/5 


1817/5 


4997/5 


6089/5 


-/- 


7500/1 


5270/2 


2364/3 


691/4 


10626/4 


8885/6 


8927/6 


15649/6 


3 


-/- 


-/- 


1—1 

0 

Gi 

0 

0 

CO 


5370/4 


2364/5 


2372/5 


3490/5 


6670/5 


7762/5 


-/- 


-/- 


10600/2 


CO 

0 

CO 


2364/4 


10591/5 


8799/6 


8841/6 


15563/6 



After adding Xm+i with associated values Wm+i and Cm+i the algorithm updates 
its data structures by 

— Calculating W{m + 1) = W{m) + Wm+i and M(m + 1) = M{m) + 
Wm+id{l, TO + 1) in 0(1) time. 

— Updating the 2k lower envelopes as described in Sections 2.3 and 2.4 in 
O(logTO) worst case and 0(1) amortized time (simultaneously) per envelope. 

— For 1 < z < fc, calculating OPTi{m + 1) = minj<m_|_i Vi{j,m + 1,0) and 
POPTi{m + 1) = Cm + rninj<m U/(j, to + 1, 0) in 0(1) time each. 

Thus, in each step, the algorithm uses, as claimed, only a total of 0{k\ogn) 
worst case and 0 {k) amortized time (simultaneously). 

The algorithm above only fills in the dynamic programming table. But, 
given the values OPTi{j), POPTi{j) and the corresponding indices MINi{j), 
PMINi{j) one can construct the optimal set of medians in 0{k) time so this 
fully solves the problem and finishes the proof of Theorem 1. 



2.6 An Example 

In this example, let n = 9 be the total number of nodes, and fc = 3 the maximum 
number of resources. The nodes have x-coordinates 0, 5, 7, 10, 12, 13, 55, 72, 90, 
start-up costs Cj 5400, 2100, 3100, 100, 0, 9900, 8100, 7700, 13000, and weights 
Wj 14, 62, 47, 51, 35, 8, 26, 53, 14. Table 1 shows the values of OPT, MIN, 
POPT and PMIN, respectively. From these tables, we can see that for to = 9 
the optimal placement is to have two resources at X 4 and x^. 

Figure 3 shows the functions V2(j, 8, x) and V 2 {j, 9, x). The two arrays for the 
lower envelope when to = 8 are ^2(8) = (5,8) and ^2(8) = (296, 361.5, -l-oo). 
The two arrays for the lower envelope when to = 9 are .^2(9) = (5,8,9) and 
2^2(9) = (310, 361.5, 669.4, -l-oo). As we can see, the intersection point of line 5 
and line 8 in the left part of Figure 3 shifts to the left by wg when we add xg 
in the next step (right half of the figure), i.e., from 65.6 to 51.5. Actually, all 
intersection points will shift the same amount when a new node is added. That is 
why the partitioning value 361.5 does not change in the arrays ^2(8) and ^2(9) 
(361.5 = 65.5 -h 1U(8) = 51.5 -h 1U(9)). 
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3 The fc-Coverage Problem 

In this section we sketch how to solve online fcCL with uniform coverage, i.e., to 
maintain a /e-placement S minimizing 

n 

cost{S) = 

XiGS j=l 

as TO grows, where r is some fixed constant and Ij(S) = 0 if dj(S) < Vj and 
Ij{S) = 1 if dj{S) > Tj. This problem has a simpler DP solution than the 
fc-median problem, albeit one with a similar flavor. 

We say that Xj is covered by a point in S if dj{S) < r. For a point xj, let 
covj denote the index of the smallest of the points x\, ... ,xj covered by Xj, and 
uncj the index of largest of the points x\, ... ,xj not covered by xj: 

coVj = min{i : i < j and r + Xi > Xj}, unCj = max{i : i < j and r -\- Xi < Xj}. 

Note that Xunc, is the point to the left of Xcovj, i-e., uncj = coVj — 1 if this 
point exists. The points that can cover Xj are exactly the points in [xcovj, Xj]. 
Similar to the fc-median problem, let OPTi{m) denote the minimum cost of an 
z-cover for the first to points Xi, . . . , x^m for z = 1, . . . , fc and POPTi{m) be the 
minimum cost of covering x\, . . . ,Xm if Xm is one of the resources. Then 

min I Wm + OPTi(m — 1) , min POPTi{j)\ (9) 

Cm+ min OPTi_i{j). (10) 

UnCm 1 

The first term in the minimum of Eq. (9) corresponds to the possibility that Xm 
is not covered; the second term to the possibility that Xm is covered. It ranges 
over all possible covers. 

In order to solve the problem in an online fashion we will need to be able 
to calculate the values of OPTi{m) and POPTi{m) efficiently at step to when 
processing Xm- This can be done in a fashion similar to that employed for fcML 



OPT, (to) = 
POPT,{m) = 





Fig. 3. Functions V 2 {j, 8, x), tor j = 2, ... , 8, and V 2 {j, 9, a:), for j = 2, . . . , 9. The lines 
are labelled by j. The thick lines are the lower envelopes. 
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resulting in a similar result, i.e., computing all of the OPTi(m) and POPTi(m) 
values can be in 0{k) amortized time and 0{k\ogm) worse case time per update. 
The details can be found in the full version of this paper. 

4 Conclusion and Open Problems 

In this paper we discussed how to solve the online fc-median on a line problem 
in 0{k) amortized time and 0{k\ogn) worst case time per point addition. This 
algorithm maintains in the online model the dynamic programming speed-up for 
the problem that was first demonstrated for the static version of the problem 
in [6]. The technique used is a generalization of one introduced in [3]. We also 
showed how a simpler form of our approach can solve the online fc-coverage on 
a line problem with uniform coverage radius in the same time bounds. It is not 
clear how to extend our ideas to the non-uniform coverage radius case. 

A major open question is how to solve the dynamic fc-median and /c-coverage 
on a line problem. That is, points will now be allowed to be inserted (or deleted!) 
anywhere on the line and not just on the right hand side. In this case would it 
be possible to maintain the /c-medians or fc-covers any quicker than recalculating 
them from scratch each time? 
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Abstract. We show that the combinatorial complexity of the overlay of 
the lower envelopes of two collections of d- variate piecewise linear func- 
tions of overall combinatorial complexity n is 0{n'^a^ (n)) and 
for any e > 0 when d > 2, and 0{'n?a{n) logn) when d = 2. This ex- 
tends and improves the analysis of de Berg et al. [9]. We also describe 
an algorithm that constructs the overlay in the same time. 

We apply these results to obtain efficient general solutions to the problem 
of matching two polyhedral terrains in under translation. For the 

perpendicular distance measure, which we adopt from functional anal- 
ysis, we present a matching algorithm that runs in time for 

any e > 0. For the directed and undirected Hausdorff distance measures, 
we present a matching algorithm that runs in time 0(n'^ +d+s^ 
e > 0. 



1 Introduction 

Overlays of Envelopes. The arrangement A{T) of a collection T of graphs of 
d- variate functions (i.e., functions of d variables) in is the subdivision of 

jgd+i j^y j: lower envelope S{tF) of A{T) is the pointwise minimum 

of the functions of T . For two collections T and Q as above, the sandwich region 
S{E, G) consists of all points that lie below the lower envelope of A{T) and above 
the upper envelope of A{G) (defined as the pointwise maximum of the functions 
of G). The minimization diagram Ai{T) of S{T) is the subdivision of ob- 
tained by projecting £{T) onto the hyperplane Xd+i = 0. The overlay 0{T,G) 
of envelopes £{T) and £{G) is the refined subdivision obtained by superimposing 
Ai{T) and A4(G) in The last definition can be naturally extended to the 
overlay 0{iFi,iF2, ■ ■ ■ ,lFk) of envelopes of arrangements of multiple collections 

* A limited preliminary version of some of the results described in this paper has 
appeared in the second author’s Ph.D. thesis [22]. 

** Supported in part by NSF Grant CCR-0T21555. 



T. Hagerup and J. Katajainen (Eds.): SWAT 2004, LNCS 3111, pp. 114—126, 2004. 
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Ti, . . . ,Tk- The (combinatorial) complexity of each structure introduced above 
is defined to be the overall number of its faces (of all dimensions). 

The study of lower envelopes and related structures has a long and rich his- 
tory in computational geometry, as they have innumerable applications to the 
various problems in this field; see Sharir and Agarwal [20] for an overview. Edels- 
brunner et al. [10,11] and Pach and Sharir [18] have shown that the complexity of 
and of when T and Q are collections of piecewise linear (possibly 

partially defined) functions in of overall complexity n, is 0(n‘^a(n)), where 
a(n) is the inverse Ackermann function. Agarwal et al. [2] have shown that when 
T and Q consist of n semi-algebraic bivariate functions of constant description 
complexity, 0{T, G) has complexity for any £ > 0. Recently, Koltun 

and Sharir [13] have shown that for analogous collections T and G of trivariate 
functions, the complexity of 0(T ^G^ is 0(n^+®) for any e > 0. 

This paper deals with the special case of collections T and G of piecewise 
linear (possibly partially defined) d-variate functions of overall complexity n. A 
relevant result is that of de Berg et al. [9], who have studied the complexity of the 
vertical decomposition of an arrangement of a set of triangles in Although 
their paper does not explicitly discuss overlays of envelopes, their analysis implies 
that the complexity of 0(T , G) is I7(n^a^(n)) and 0(n^2“(”^ logn) when T and 
G are as above and d = 2. We extend this analysis to show that the complexity 
of 0{iF,G) is I7(n‘^a^(n)) and for any £ > 0 when d>2. This provides 

the first non-trivial upper bound on the complexity of the overlay of envelopes 
in dimensions d > 3. For d = 2 we prove a sharper bound of 0(n^a(n) log n). We 
also show that the complexity of 0(d^i, • • • , iFd) is for collections 

. . . , as above. Finally, we describe an algorithm for constructing 0(iF, G), 
G) and S{iF) in time that matches the respective complexity bounds. 

Matching Terrains. The comparison of geometric objects is a task that naturally 
arises in many application areas, such as computer vision, computer aided design, 
robotics, medical imaging, etc. In many applications we are given a set of allowed 
transformations, and wish to match the shapes under these transformations, 
that is, to find an allowed transformation that, when applied to the first object, 
minimizes its distance (under some specific distance measure) to the second one. 
A natural transformation class is that of translations, which forms the focus of 
our work. See Alt and Guibas [4] for an overview of matching algorithms for 
various types of objects, distance measures and transformation classes. 

Most matching algorithms in the existing literature either deal with two- 
dimensional problems or only consider point sets. Algorithms for matching 
shapes more complicated than points in dimensions higher than two have been 
presented for the first time only recently in the second author’s thesis [22] . There 
it has been shown that a translation that minimizes the Hausdorff distance be- 
tween two polyhedral sets of total complexity n in can be computed in 

0{n‘^ +3d-i-2 time for d>2. The only other higher-dimensional result we 

recently learned about is the result by Agarwal et al. [1], who compute the min- 
imum Hausdorff distance under translations for two sets of m and n L 2 -balls in 
in 0(m^n^(TO -I- n) log^(mn)) time. 
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Terrains are a natural subset of shapes that have particularly many applica- 
tions in especially for geographical data. However also in higher dimensions 
terrains are an important class of shapes since they are graphs of arbitrary d- 
variate functions. In Section 3 we present algorithms for matching polyhedral 
terrains in under translations, for arbitrary d > 1. We present algorithms 

for these problems that reduce the matching task to the computation of cer- 
tain overlays and sandwich regions of envelopes of collections of piecewise linear 
functions. We show that we can compute a translation of a terrain of complexity 
m which minimizes its perpendicular distance (which is an adaptation of the 
Loo-Minkowski metric used in functional analysis) to a terrain of complexity n, 
in time for any £ > 0. Sharper running time bounds are obtained 

for d <2. 

Assuming that the terrains are continuously defined over a convex domain, 
we provide an algorithm that matches two terrains of complexity n under the 
(directed or undirected) Hausdorff distance measure in time 0{n^ fQj. 

£ > 0. Moreover, for the directed Hausdorff distance our algorithm applies even 
when we are matching a terrain with an arbitrary polyhedral set. For technical 
reasons, we assume that the metric in terms of which the Hausdorff distance 
is defined belongs to a certain class of convex polyhedral metrics of constant 
description complexity that includes for instance the Loo- and Li-metrics. 



2 Overlays of Envelopes of Piecewise Linear Functions 

2.1 Lower Bounds 

In this section we describe two simple constructions of collections of n d-simplices 
in for any d>2 that define overlays of high complexity. Since d-simplices 
are special cases of piecewise linear d-variate functions, our lower bound nat- 
urally extends to the latter more general family of objects. When d = 2 both 
constructions are the same and are identical to the construction presented by de 
Berg et al. [9]. Throughout the remainder of the paper, denote the axes in the 
{d+ l)-dimensional space by xi, . . . , Xd+i- Let Xd+i denote the vertical direction 
in in terms of which the lower and upper envelopes are defined. 

Theorem 1. For d > 2, there are collections T and Q of 0{n) d-simplices in 
for which the complexity of 0(T ,Q') is I7(n‘^a^(n)). 

Proof. For the sake of clarity, we describe the construction using infinite axis- 
parallel d-dimensional “strips”. It can be trivially modified to use finite d- 
simplices in general position. 

There exists a collection of n line segments in the plane, such that the com- 
plexity of the lower envelope of their arrangement is f2(na{n)), see [23]. Consider 
such a collection L in the ccix^+i-plane. Without loss of generality, assume that 
the Xd+i-coordinates of the segments in L are strictly positive. Take the Carte- 
sian product of each segment with the (d — l)-flat xi = Xd+i = 0. Let T be the 
resulting collection of d-dimensional strips. 
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Consider an analogously constructed collection Ci of strips, this time or- 
thogonal to the X 2 Xd-i-i-plane. For 2 < z < d — 1, consider also the collec- 
tion Ci = Uj=i of strips, where Sj is the Cartesian product of the line seg- 
ment ((2j, 0),(2j -I- 1,0)), drawn in the Xj+ia;d+i-plane, with the {d — l)-flat 
Xi+i = Xd+i = 0. Define Q = [fiZl C\. 

When d = 2, overlaying M{T) and M{G) results in a grid of l7(no;(n)) x 
f2{na{n)) lines, thus producing I7(n^a^(n)) vertices. In higher dimensions, over- 
laying M{T) and Ad(C'i) similarly produces fi{rZa^{n)) infinite (d — 2)-fiats 
orthogonal to the X\ and X2 axes. The (partial) diagram M{C2 U . . . U Cd-i), on 
the other hand, essentially contains a grid of I7(n‘^“^) XiX 2 -parallel planes (each 
belonging to the boundary of the intersection of d — 2 projections of strips, one 
strip from each of the groups C2, ■ ■ ■ , Cd-i)- In the overlay 0{T, G), each of the 
latter planes intersects all of the former (d — 2)-fiats, resulting in C(rz‘^a^(n)) 
vertices. 



Theorem 2. There are d collections of n d-simplices in such that the 

complexity of the overlay of the d respective lower envelopes is 

Proof. For 1 < z < d, let Ti be a collection of d-dimensional strips orthogo- 
nal to the Xia;d-i-i-plane, constructed as follows. Consider, as above, a collec- 
tion F of n segments, drawn in the Xja;d-i-i-plane, such that the complexity of 
E{r) is n{na{n)). We define to be the collection of Cartesian products of 
the segments of P with the (d — l)-fiat Xi = Xd+i = 0. The complexity of 
0 (Fi,F 2 , . . . ,Pd) is easily seen to be as claimed. 



Remark 1. Theorem 1 and the earlier construction of de Berg et al. [9] dispel a 
belief, expressed, e.g., in [3], that the analysis of Edelsbrunner et al. [11] implies a 
bound of 0{n‘^a{n)) on the complexity of 0{iF, Q) when T and Q are collections 
of piecewise linear functions of overall complexity n in 1R‘^+^. 



2.2 Upper Bounds 

We note that it is sufficient to analyze collections T and G of n d-simplices in 
general position, as such analysis easily carries over to arbitrary collections of 
piecewise linear functions of overall complexity n. We will thus confine ourselves 
to this setting. It is also easy to see that it is sufficient to count the vertices of 
0{iF,G), since all higher-dimensional faces of the overlay can be charged to its 
vertices, such that each vertex is charged at most a constant number of times. 

Lemma 1. Given a collection T of n d-simplices in and a {j 1)- 

dimensional convex body P contained in the hyperplane Xd+i = 0; Ihe com- 
binatorial complexity of £{Tdp) is for any e > 0, where Tgp is the 

collection of cross-sections of the simplices of IF within the Xd+i-vertical surface 
dP X Xd+i spanned by the boundary dP of P. 
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Proof. Notice that Tgp is a collection of surfaces that are graphs of j-variate 
functions, partially defined over dP. Furthermore, every (j + l)-tuple of surfaces 
intersects in at most two points, and a vertical projection of a fc-dimensional 
feature (0 < fc < j) of the arrangement A{Tqp) onto dP intersects an analogous 
projection of a (j — fc)-dimensional feature of A{J^dp) in at most two points. In 
the full version of this paper [14] we show that the analysis of Sharir [19] for the 
complexity of the lower envelope of an arrangement of semi-algebraic surfaces 
carries over to our setting. 



Theorem 3. Given two collections T and G of n d-simplices in the com- 
plexity ofO{iF,G) is for any £ > 0. 

Proof. Our proof relies on the concept of efficient hierarchical cuttings [7,16], 
and is based on the proof technique of de Berg et al. [9]. For the case d = 2 
we provide a sharper upper bound of 0(n^a(n) log n) (Theorem 4) using the 
analysis technique of Tagansky [21], which improves the result of de Berg et al. 
[9]. 

A (1/r) -cutting .nz of a set P of n hyperplanes in is a subdivision of 
the space into simplices, such that each simplex is intersected by at most n/r 
hyperplanes of P. The size of S, denoted by [S'], is defined to be the number 
of the simplices in the subdivision. A cutting S' is said to C-refine a cutting S 
if every simplex of S' is completely contained in some simplex of S , and every 
simplex of S contains at most C simplices of S' . Let C and p be appropriate 
constants. A sequence H = S'o, S'!, . . . , 5"^ is called an efficient hierarchical (1/r)- 
cutting of P if Sg consists of the single degenerate ‘simplex’ and for all 
1 < i < k, Si is a (l/p* (-cutting of size 0(p‘^') of P that C-refines Si-i, and 
pk-i ^ j, ^ pk_ (Thus, k = [logpr].) For any simplex s in Si, the simplex of 
Si-i that contains s is said to be the parent of s, denoted by parent(s). 

To analyze the number of vertices of 0{P,G) we first describe their combi- 
natorial structure. For d -|- 1 = 3 , each vertex of 0{fF,G) is either a vertex of 
Ai{iF) or M{G), or an intersection of an edge of M{T) with an edge of M{G) 
[ 2 ]. Similarly, it is easy to check that for any d -|- 1 > 3 , a vertex of 0{iF, G) is 
either a vertex of Ai{T) or A4(G), or an intersection of a j-face of Ai{T) with a 
(d — j)-face of M{G), for some 1 < j < d — 1 . We denote the vertices of the latter 
type as j-vertices. It is known that the number of vertices of Ai{T) and A4{G) 
is 0(n'^a(n)) [ 10 ]. To bound the complexity of 0(iF,G) it remains to analyze 
the number of j-vertices, for 1 < j < d — 1. 

Now, consider the (d— l)-faces of the projections of the simplices of T and G 
onto Xd+i = 0. For each such face, consider the (d— l)-hyperplane it spans. Let 
T-L be the collection of 2(d -I- l)n hyperplanes defined in this manner by all the 
simplices of T and G. Construct an efficient hierarchical (l/n)-cutting of "H. By 
definition, each simplex at the last level of the cutting is crossed by at most a 
constant number of faces of the projections of T and G . For convenience, we add 
one more refinement level to the hierarchical cutting such that no simplex be- 
longing to this final level is cut by a face as above. We thus get a final hierarchy 
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H = Sq, Si, . . . , Skj which satisfies the above definition of an efficient hierar- 
chical cutting, with appropriate constants p and C, and the additional property 
that the simplices of Sk are not intersected by the boundaries of the simplices 
in the projections of T and Q . 

For any simplex s belonging to some level of H, define to be the set of 
simplices of T whose projections intersect the interior of s with their boundaries. 
Also let be the set of simplices of T whose projections contain s, but do not 
contain parent(s). Define and analogously with respect to G, and put 

= Fg U Gf and U Gf ■ Also set Fs = F^ U F^ , and define Gs and 

Fg analogously. 

Consider a j-vertex v of 0{F, G), for some 1 < j < d — 1. It is an intersection 
of a j-face of M{F) with a (d — j)-face of M{G), which are respectively defined 
by d -I- 1 — j simplices of F and j + 1 simplices of G- The collection of these 
(d-l-1 — j)-l-(j-l-l) = d+2 simplices is said to define v, and is denoted by def(w). 

We claim that for every v as above, there exists a simplex s belonging to some 
level of H, such that def(u) is contained in If, and at least one of the simplices 
of def(t!) belongs to F^. Indeed, there is a simplex Si at every level Si of H that 
contains v. Every simplex Si in the sequence sq, . . . ,Sk contains Si+i (unless, of 
course, i = k). Since sq is the whole space def(u) is completely contained 
in On the other hand, is by definition empty. Thus, there exists an i 
such that at least one simplex of def(u) is not contained in F ^. , but all of them 
are contained in (and are thus not contained in 

simplices of def(u) that are not contained in F^. are thus contained in F ^ , which 
proves our claim. 

This claim implies that to count all j-vertices v as above it suffices to consider 
all simplices s of H, and for each simplex to consider the vertices defined only 
by simplices from Fg, with at least one simplex coming from Ff, without loss 
of generality. Let us consider a specific simplex s of H and a specific value of j, 
and derive an upper bound on the number of j-vertices v that correspond to s 
in this fashion. 

The j-face of Fi{F) that defines v lies on the projection of an intersection 
of d -I- 1 — j simplices of Fg- Our assumption implies that at least one of them 
belongs to F^. Consider some (d — j)-tuple of simplices of Fg, and their (j -|- 1)- 
dimensional intersection surface (which lies on a (j -I- l)-fiat). The j-face of S{F) 
that defines v lies on the intersection of this surface, for some tuple as above, 
with the lower envelope £(F^). Notice that the simplices of F^ are totally de- 
fined over s, and thus the envelope £{F^) behaves over s as the lower envelope 
of a collection of hyperplanes, which is a convex poly tope. The intersection of 
the above (j -I- l)-dimensional surface with £(Ff) is thus part of a j-dimensional 
convex polytope, defined as the intersection of £{F^) with the (j -I- l)-fiat con- 
taining the surface. 

Consider the projection P of this polytope onto the hyperplane Xd+i = 0, 
and consider the cross-section of Ai{Gg) within dP. It is the projection of the 
part of £{Gs) that lies over dP, and Lemma 1 thus implies that its complexity is 
0{\Gs\^~^^) for any £ > 0. Any j-vertex v as above clearly corresponds to a vertex 
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in this cross-section, for some (c? — j)-tuple of simplices of Ts selected above. The 
number of such j-vertices v is thus O , for any £ > 0. Notice now 

that the boundary of the projection of every simplex in If, intersects the interior 
of parent(s), which implies that j/j,! < nj where is is the level of s in H. 
The above quantity is thus bounded by 




for any £ > 0. Summing over all simplices s, the number of j-vertices is 







i=l |Si| 







/ ^d+SpM \ 

|^p(d-l-e)(i-l) j 



,( 2 ) 



which equals for any £ > 0. Noticing that the bound does not depend 

on j completes the proof. 



Theorem 4. Given two collections T and G of n triangles in three dimensions, 
the complexity ofO{iF,G) is O (n^a{n) log n). 

We skip the proof, and refer the reader to the full version of this paper [14]. 



2.3 Algorithms 

Theorem 5. Given two collections T and G of n d-simplices in a com- 

plete combinatorial representation ofO{T,G) (resp., S{T) and S{T,G)) can he 
constructed in randomized expected time for any £ > 0 (resp., time 

0{n‘^a{n))). When d=2 the running time is 0{n^a{n) log n). 

Proof. As in the proof of Theorem 3, consider the {d— l)-faces of the projections 
of the simplices of T and G onto Xd+i = 0. For each face, consider the {d — 1)- 
hyperplane it spans. Let PL be the collection of 2(d-|- l)n hyperplanes defined in 
this manner by all the simplices of T and G . Consider the refinement of 0(T , G) 
with these hyperplanes, as in [10,18]. The cross-section of 0{LF,G) within some 
h £ PL is actually the overlay 0{LFh, Gh), where Tu (resp., Gh) is the collection of 
cross-sections of the simplices of LF (resp., of G) with the x^+i-vertical hyperplane 
spanned by h. Theorem 3 implies that the complexity of 0{Th, Gh) is 0(n‘^“^+®), 
for every £ > 0. Therefore, refining 0{LF,G) (which is a subdivision of with 
the 0(n) hyperplanes of PL does not asymptotically increase the complexity of 
the subdivision, which remains 0(n‘^~^'^) for every £ > 0. Each cell in the resulting 
refined subdivision is convex. It can thus be easily decomposed into simplices 
using the bottom- vertex simplicial decomposition [8,15]. 

This representation of 0(LF,G) allows us to construct this overlay using a 
standard randomized incremental approach that utilizes a conflict graph. In 
fact, our setting fits into standard abstract frameworks, see e.g. [6, Section 5.2]. 




Matching Polyhedral Terrains Using Overlays of Envelopes 121 



The construction proceeds by choosing a random permutation of the simplices 
of U Q. (We will refer to these simplices, extended by hyperplanes as above, 
as “objects” throughout the rest of this paragraph, to avoid confusion with 
the simplices of the decomposition.) We first construct in constant time the 
decomposition of the “overlay” of just the first object. We then add the objects 
one by one according to the random order. With every addition of an object, 
we insert it into the overlay and update the decomposition and the conflict 
graph. The conflict graph stores for every simplex in the decomposition a list of 
objects (that have not yet been added) that intersect it. Additionally, it stores for 
every such object a list of simplices that it intersects. This allows knowing which 
simplices are affected by the addition of a particular object. The restructuring of 
all affected simplices and their conflict lists is a standard procedure and we omit 
its rather routine details. By standard arguments [6, Section 5.2], the expected 
running time of the construction algorithm is 

O ("g 

where /(r) denotes the maximal complexity of the overlay of envelopes of two 
sets of r simplices overall. Theorem 3 shows that /(r) = for any e > 0. 

The running time of the algorithm is thus for any e > 0. For d = 2 the 

running time becomes 0(n^o;(n) log n) due to Theorem 4. 

One envelope S{fF) and the sandwich region S{fF,Q) can be constructed 
analogously in time 0{n‘^a{n)), using the fact that these structures can also be 
refined into convex subdivisions that can be decomposed using the bottom- vertex 
decomposition. 

Remark 2. We note that a 0{'n^a{n) log n) randomized incremental algorithm 
for constructing E{fF) when d = 2 has been described by Boissonnat and Do- 
brindt [5]. Their goal was to obtain an on-line algorithm and their construction 
followed a different approach that uses a two-level history graph instead of the 
conflict graph. Also, a randomized divide-and-conquer algorithm for construct- 
ing S{T) in time for any e > 0 and d>2 has been described by Sharir 

and Agarwal [20, Section 7.2.2]. 

3 Matching Terrains 

In this section we apply the above results for overlays and sandwich regions to 
matching terrains in an arbitrary fixed dimension. A (fc-dimensional) terrain F in 
is the graph F = {{x, f{x)) \ x G Df} of a /c-variate function / : Df — ^ M, 
0 < k < d, where the domain Df is a /c-dimensional subset of F is a 
polyhedral terrain if Df is a polyhedral subset of and / is a linear function 
over each polyhedron in Df. Hence, a polyhedral terrain is a polyhedral set 
with the property that every a;d-i-i-vertical line intersects the terrain in at most 
one point. We assume in the following that a polyhedral set always consists 
of a collection of simplices. As long as the terrains are given as collections of 
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convex polytopes this assumption is not restrictive, since each convex polytope 
of complexity n can be easily partitioned into 0{n) simplices [8,15]. Hence we 
can associate with each terrain F a simplicial partition Mf of its domain Df, 
such that / is linear over each simplex in My. 



3.1 Perpendicular Distance 

Let two polyhedral terrains F = {{x,f{x)) \ x G Df} and G = {(a;, 5 (x)) | 
X G Dg} in of complexity m and n, respectively, be given. Since each terrain 
intersects every vertical line at most once, it is natural to consider the height 
difference between vertically adjacent points of F and G as a distance measure. 
We therefore consider the perpendicular distance (also called uniform metric 
or Chebyshev metric) S±{F,G) = sup^.^^,^ \ f{x) — g{x)\, where we assume that 
DfG Dg. Notice that the perpendicular distance is the standard Loo-Minkowski 
metric for the functions / and g. 

We consider a translation t' = {t\, . . . ,td,td+i) G to be composed of a 

translation t= {ti, . . . ,td) G and a translation t^+i G R, hence t' = {t, td+i)- 
Using this notation we have F + t' = {{x, f{x — t) +td+i) \ x G Df + t}. We wish 
to compute a translation G R'^+^, where t* G R'^, G R, such that 

DfG Dg and the perpendicular distance between F and G is minimized, hence 

5x{F + {t* Xd+i)',G) = min min <5 _l(F + (t, t^+i), G) . (4) 

t g td+lSK 

Df+t<ZDg 

Reformulating 6±{F + t\ G) produces 

dx{F + t',G)= max \f{x-t)-g{x) + td+i\ (5) 

x^Df-^tGDg 

= max{ max K{t) - td+i , td+i - min K{t)) } (6) 

x^D fQDg — t x^D f<ZDg — t 

with hx{t) = g{x+f) — f{x). Observe that the condition Df C Dg — t is equivalent 
to t G Dh, where Dh = Dg (B {—Df). Here and throughout the rest of the paper, 
A(B B = UosA Ubesi® denotes the Minkowski sum (or vector sum) of two 
sets A,BC R'^, while A denotes the complement of a set A C R'^. We define two 
functions: 



h: Dh- 


-G R ; f i-G max hx(t) 

xeDf 


( 7 ) 


h-. Dh- 


-G R ; 1 1— >■ min hx{t) . 

xGDf 


(8) 



Let H and M be the respective graphs of h and h. These graphs are polyhedral 
terrains, and are, respectively, the upper and lower envelopes of the functions 
hx, for all X G Df. Let —F denote the set —F = {{{x,—f{x)) \ x G Df}. The 
next lemma follows directly from the definitions of FI and iJ . 

Lemma 2. Let F,G,F[,]ff he as defined above. Then FI (resp., If) is the upper 
(resp., the lower) envelope o/G© {—F) restricted to the region above Dh. 
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Reformulating (4) we have 

(i*,^d+i),G) = min min - td+i,td+i ~ h{t)}, (9) 

which implies that the translation G we are seeking is such that: 

m-Mt*) ^ m-h(t) 

2 tsDfc 2 ^ 

^nd t:+, = (h{e) + h{e))/2. (11) 

This leads to the following algorithm: We compute = Dg (B {—Dj) by com- 
puting the Minkowski sums for each pair of simplices, one from Dg and one from 
(—Df), and computing the complement of their union with a brute-force arrange- 
ment approach, in 0{{mn)‘^) time. See the full version of this paper [14] for the 
technical details. The lower envelope £ and the upper envelope £ of G (B {—F) 
are envelopes of nm pairs of simplices. Let M be the simplicial partition of the 
domain of {{t, (h{t) — h{t))/2) \ t G -Dfij. Lemma 2 implies that M is the overlay 
of £ and £ restricted to Dh- We compute the overlay of £ and £, additionally 
superimposed with the 0{mn) hyperplanes defining Dh- Notice that in the proof 
of Theorem 5 the overlay of the envelopes is additionally overlaid with a set of 
hyperplanes, and for the correctness of the proof only the number of the hyper- 
planes matters. Thus, Theorem 5 implies that the overlay can be constructed in 
0{{mnY'^^) time for any e > 0. The function {h{t) — h{t)) /2 is linear within each 
simplex of M. Therefore, the global minimum t* that minimizes this function, as 
described in (10), is necessarily reached at a vertex of M^-. Hence it suffices to 
iterate over all vertices in , which takes time proportional to their number, 
which is for any e > 0. We thus obtain t*, which we can plug into 

(11) to get 

Overall, the described algorithm runs in time 0((mn)‘^+®), for d > 2. For 
d = 1 we can compute £ and £ in time O ( mn log (mn)) [12]. The computation of 
the overlay and the clipping can be done with a simple sweep in 0{mna{mn)) 
time. For d = 2 we can construct £ and £ in time 0{{mn)‘^a{mn)) [11] and 
construct the overlay in time 0((mn)^a(mn)log(mn)) using Theorem 5. The 
following theorem summarizes the results of this section. 

Theorem 6. Let F and G he two polyhedral terrains in with complexities 

m and n respectively. We can decide whether there exists a translation of the 
domain of F to within the interior of the domain of G, and we can compute a 
translation that minimizes the perpendicular distance between F and G in time 
for any e > 0. For d = 1 the running time is 0{mnlog{mn)) and 
for d = 2 the running time is 0{{mn)^a{mn) log(mn)). 

3.2 Hausdorff Distance 

Given two polyhedral terrains F and G in we consider the task to compute 

the translation that brings F into the smallest possible distance to G, according 
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to the (directed or undirected) Hausdorff distance measure. We accomplish this 
by first solving the corresponding decision problem: For F and G as above, decide 
whether there exists a translation that brings F into Hausdorff distance at most 
5 of G, for a given parameter 5 > 0. For technical reasons, we assume that one 
of the terrains we have to match, say G, is continuously defined over a convex 
domain. In the full version of the paper [14] we prove our results for a more 
general class of terrains called J-terrains (where 5 is the threshold parameter in 
the decision procedure). 

Let p be a metric in and let H, H C be two compact sets. The 

directed Hausdorff distance ^'h(41, B) is defined as ^'h(^j B) = maxxeA min^g b 
p{x,y). The (undirected) Hausdorff distance <5h(^, S) is defined as <5 h( 41, H) = 
max{^' h{A,B), S'b{B, H)}. The Hausdorff distance is a natural way to extend a 
metric p to the class of compact sets. Note that <5 h is indeed a metric, while <5 'h 
is not, since it is not symmetric. Nonetheless, the directed Hausdorff distance is 
often used in partial matching applications, where the task is to find a subset of 
the shape B that resembles the shape A the most. 

Our algorithms assume that p is a convex polyhedral metric of constant 
description complexity that has the property that the (d + l)-dimensional unit 
ball vertically projects onto the d-dimensional unit ball (defined in terms of p). 
This is for example the case for the commonly used Loo- and Li-metrics. We 
call a metric that satisfies these assumptions projectable. For d = 1 our approach 
also works for the Euclidean metric. 

Unfortunately, due to space limitations we have to omit all technical details 
of our treatment of matching under the Hausdorff distance. We state the main 
results without proof, which proceed by reducing the described decision prob- 
lems to testing whether a certain sandwich region in an appropriately defined 
arrangement is empty. Details can be found in the full version of this paper [14] . 

Theorem 7. Let F be a polyhedral set and G be a convex- domain polyhedral 
terrain in with respective complexities m and n. We can test whether 

there exists a translation in that brings F into directed Hausdorff distance 

at most S of G in the following time: 

— 0{mnlog(jnn)) when d = 1 and the underlying metric is projectable, and 

0(TOn2“*^’""Uog(?rm)) when the metric is Euclidean. 

— n)) when d > 2 and the metric is projectable. 

For the undirected Hausdorff distance we obtain similar runtimes which are 
symmetric in m and n. In order to solve the optimization problem we apply the 
technique of parametric searching [17]. 

Theorem 8. Let F and G be convex-domain polyhedral terrains in with 

respective complexities m and n. Put N = m-\-n. We can compute a translation 
that minimizes the directed (resp., undirected) Hausdorff distance between F 
and G in time ■'■^) (resp., 0{N'^ '^'^'^^)), for any £ > 0 and d > 1, 

assuming that the underlying point metric is projectable. 




Matching Polyhedral Terrains Using Overlays of Envelopes 125 



Acknowledgements. We are particularly grateful to Danny Halperin for point- 
ing us to the work of de Berg et al. [9]. We would also like to thank Pankaj K. 

Agarwal and Micha Sharir for helpful discussions. 

References 

1. P. K. Agarwal, S. Har-Peled, M. Sharir, and Y. Wang. Hausdorff distance under 
translation for points, disks, and balls. In Proc. Symp. Comp. Geom., 2003. 

2. P. K. Agarwal, O. Schwarzkopf, and M. Sharir. The overlay of lower envelopes and 
its applications. Discrete and Computational Geometry, 15:1-13, 1996. 

3. P. K. Agarwal and M. Sharir. Arrangements and their applications. In J.-R. 
Sack and J. Urrutia, editors. Handbook of Computational Geometry, pages 49-119. 
Elsevier Science Publishers B.V. North-Holland, Amsterdam, 2000. 

4. H. Alt and L. Guibas. Discrete geometric shapes: Matching, interpolation, and 
approximation - a survey. In J.-R. Sack and J. Urrutia, editors. Handbook of 
Computational Geometry. Elsevier Science Publishers B.V. North-Holland, 2000. 

5. J.-D. Boissonnat and K. Dobrindt. On-line construction of the upper envelope of 
triangles and surface patches in three dimensions. Comput. Geom. Theory AppL, 
5:303-320, 1996. 

6. J.-D. Boissonnat and M. Yvinec. Algorithmic Geometry. Cambridge University 
Press, UK, 1998. 

7. B. Chazelle. Cutting hyperplanes for divide-and-conquer. Discrete Comput. Geom., 
9(2):145-158, 1993. 

8. K. L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal 
on Computing, 17:830-847, 1988. 

9. M. de Berg, L. J. Guibas, and D. Halperin. Vertical decompositions for triangles 
in 3-space. Discrete Comput. Geom., 15:35-61, 1996. 

10. H. Edelsbrunner. The upper envelope of piecewise linear functions: Tight complex- 
ity bounds in higher dimensions. Discrete and Comp. Geom., 4:337-343, 1989. 

11. H. Edelsbrunner, L. Guibas, and M. Sharir. The upper envelope of piecewise linear 
functions: algorithms and applications. Discr. Comp. Geom., 4:311-336, 1989. 

12. J. Hershberger. Finding the upper envelope of n line segments in 0(n log n) time. 
Inform. Process. Lett., 33:169-174, 1989. 

13. V. Koltun and M. Sharir. The partition technique for overlays of envelopes. SIAM 
Journal on Computing, to appear. 

14. V. Koltun and C. Wenk. Matching polyhedral terrains using overlays of envelopes. 
http://www.cs.berkeley.edu/~vladlen/linear-overlays-full.zip. 

15. C. Lee. Subdivisions and triangulations of polytopes. In J. E. Goodman and 
J. O’Rourke, editors, Discr. Comp. Geom., pages 271-290. CRC Press, 1997. 

16. J. Matousek. Range searching with efficient hierarchical cuttings. Discrete Comput. 
Geom., 10(2):157-182, 1993. 

17. N. Megiddo. Applying parallel computation algorithms in the design of serial 
algorithms. J. ACM, 30(4):852-865, 1983. 

18. J. Pach and M. Sharir. The upper envelope of piecewise linear functions and the 
boundary of a region enclosed by convex plates: combinatorial analysis. Discrete 
Comput. Geom., 4:291-309, 1989. 

19. M. Sharir. Almost tight upper bounds for lower envelopes in higher dimensions. 
Discrete and Computational Geometry, 12:327-345, 1994. 




126 



V. Koltun and C. Wenk 



20. M. Sharir and P. K. Agarwal. Davenport-Schinzel Sequences and Their Geometric 
Applications. Cambridge University Press, New York, 1995. 

21. B. Tagansky. A new technique for analyzing substructures in arrangements of 
piecewise linear surfaces. Discrete and Computational Geometry, 16:455-479, 1996. 

22. C. Wenk. Shape Matching in Higher Dimensions. Ph.D. thesis. Free University 
Berlin, Berlin, Germany, 2002. 

23. A. Wiernik and M. Sharir. Planar realizations of nonlinear Davenport-Schinzel 
sequences by segments. Discrete Comput. Geom., 3:15-47, 1988. 




Independent Set of Intersection Graphs of Convex 

Objects in 2D* 



Pankaj K. Agarwal and Nabil H. Mustafa 



Department of Computer Science, 

Duke University, Durham, NC 27708-0129, USA. 
{pankaj , nabil}@cs . duke . edu 



Abstract. The intersection graph of a set of geometric objects is dehned as a 
graph G = {S,E) in which there is an edge between two nodes Si,Sj € S 
if Si n Sj 7 ^ 0. The problem of computing a maximum independent set in the 
intersection graph of a set of objects is known to be AP-complete for most 
cases in two and higher dimensions. We present approximation algorithms for 
computing a maximum independent set of intersection graphs of convex objects 
in Specifically, given a set of n line segments in the plane with maximum 
independent set of size k, we present algorithms that find an independent set of 
size at least (i) {K/2log{2n/ in time 0(n^) and (ii) (K/21og(2n/fv))^'^^ 
in time log“ n). For a set of n convex objects with maximum independent 

set of size k, we present an algorithm that finds an independent set of size at least 
(k/2 \og{2n/ in time 0{n^ -\-t{S)), assuming that S can be preprocessed 
in time t[S) to answer certain primitive operations on these convex sets. 



1 Introduction 

An independent set of a graph is a subset of pairwise nonadjacent nodes of the graph. 
The maximum-independent- set problem asks for computing a largest independent set 
of a given graph. Computing the largest independent set in a graph is known to be 
A^P-complete even for many restricted cases (e.g. planar graphs [11], bounded- degree 
graphs [19], geometric graphs [20]). Naturally, the attention then turned toward approx- 
imating the largest independent set in polynomial time. Unfortunately, the existence 
of polynomial-time algorithms for approximating the maximum independent set effi- 
ciently for general graphs is unlikely [12]. However, efficient approximation algorithms 
are known for many restricted classes of graphs. For planar graphs, approximation al- 
gorithms exist that can compute an independent set of size arbitrarily close to the size 
of the maximum independent set. Noting that a graph is planar if there exists a set of 
unit disks in the plane whose contacts give the edges of the planar graph [17], a natural 
direction is to investigate the independent- set problem for the graphs induced by a set 
of geometric objects. The intriguing question there is whether (and what) geometric 
nature of objects aids in efficient computation of maximum independent set. One such 
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family of graphs arising from geometric objects that have been studied are the so-called 
intersection graphs. 

Given a set S' = {si, . . . , s„} of geometric objects in the intersection graph 
of S, Gs = {V, E) is defined as follows: each node Vi G V corresponds to the object 
Si and 6ij G E if Si n sj ^ A subset V' Q V is an independent set in Gs if for 
every pair of nodes Vi, vj G S', Si fl Sj = 0. For brevity, we say “independent set of S” 
when we mean “independent set of the intersection graph of S”. In this paper, we present 
approximation algorithms for the independent- set problem on intersection graphs of line 
segments and convex objects in the plane. 

Besides the inherent interest mentioned above, independent sets of intersection 
graphs have found applications in map labeling in computational cartography [2] , and fre- 
quency assignment in cellular networks [16]. For example, in the map-labeling problem, 
we are given a set of labels of geometric objects, and the goal is to place the maximum 
number of labels that are pairwise disjoint. Computing the maximum independent set 
of these labels yields a labeling with the maximum number of labeled objects. 

Related work. Given S, let I* (S') denote a maximum independent set of the intersection 
graph of S. Dehne k(S) = |I* (S) |. We will use n to denote k(S) if S is clear from the 
context. We say that an algorithm computes a c-approximation to I*(S) if it computes 
an independent set of size at least k{S)/c. 

For a general graph G{V, E) with n vertices, there cannot be a polynomial-time 
approximation algorithm with approximation ratio better than for any e > 0 unless 

NP = ZPP [12]. Currently the best algorithm for a general graph finds an independent 
set of size l7(/t • log^ n/n) [6], where k is the size of a maximum independent set in G. 

However, for intersection graphs of geometric objects, better approximation ratios 
are possible. If / is a set of intervals in M, then the maximum independent set of the 
intersection graph of I can be computed in linear time. Computing I* (S') is known to be 
TVP-complete if S is a set of unit disks or a set of orthogonal segments in [ 1 4] . For unit 

disks in R^, a polynomial time (1 -|- e) -approximation scheme was proposed in [13]. For 
arbitrary disks, independently Erlebach etal. [10] and Chan [7] presented a polynomial 
time ( 1 -he) -approximation scheme. The above schemes for computing independent set of 
disks use shifted dissection, which relies heavily on the fact that the geometric objects are 
disks. A divide-and-conquer technique is used for the case of the intersection graphs of 
axis-parallel rectangles in the plane, for which Agarwal et al. [2] presented a 0(log n)- 
approximation algorithm in time 0(n log n). If the rectangles have unit height, they 
describe a ( 1 -h e) -approximation scheme with running time 0{n log n + These 

results have recently been improved (and simplihed) in Chan [8]. Efficient algorithms 
are known for other classes of graphs as well [3,5]. 

In other related work [15], it was shown that the problem of recognizing intersection 
graphs of line segments, i.e., given a graph G, does there exist a set of segments whose 
intersection graph is G, is NP-hard. 

Our results. The main results of this paper are two approximation algorithms for a set 
S of n segments in R^, all of which intersect a common vertical line. We show that we 
can compute ' 

* All logarithms in this paper are base 2. 
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- in O(n^) time an independent set of S of size at least y/k, and 

- in log'^ n) time an independent set of S of size at least 

Using these results, we show that for an arbitrary set S of segments in we can 
compute an independent set of size at least 

- i/K/21og(2n/K) in time O(n^), or 

- («;/21og(2n/K))^/^ in time 0(n^!^ log'^ n). 

Finally, we extend our results to convex sets. Namely, for a family S of n convex 
sets in we can compute in 0(n^ + r(S')) time an independent set of size at least 
(«:/21og(2n/K))^/^, assuming that certain primitive operations (namely sidedness 
and pairwise object intersection queries) on these convex sets can be performed by 
preprocessing S in t{S) time. 

The paper is organized as follows. In Section 2 we describe the approximation 
algorithm set of segments with a vertical stabbing line, and in Section 3 we describe the 
K^/^-approximation algorithm. Section 4 shows how to extend these results to arbitrary 
segments, and Section 5 describes the algorithm for convex sets. 

2 A Approximation Algorithm for Segments 

Let S = {si, . . . , s„} be a set of line segments in the plane. Let x{p),y{p) denote the 
X- and y-coordinates of a point p G Let l(s) (resp. r(s)) denote the x-coordinate 
of the left (resp. right) endpoint of the segment s G S, and let ai denote the slope of 
Si. We assume that all the segments in S intersect the y-axis. We also assume that the 
segments in S are sorted in increasing order of their intersection points with the y-axis, 
and we use S = {si, . . . , Sn) to denote this sorted sequence. 

We call a subsequence S' = (sj^,... ,Si^) of S s-monotone (see Figure 1(a)) 
if 



- Sij n Sij, =0 for all 1 < j < fc < m. 

- <yij < for all 1 < j < m (called increasing s-monotone) or ai^ > for 

(called decreasing s-monotone). 



Lemma 1. Let I G S be a subsequence of pairwise-disjoint segments. There exists an 
s-monotone sequence I' Q I of size at least \f\T\. 

Proof. By Dllworth’s theorem [9], there is a subsequence /' of I such that the slopes of 
the segments are either monotonically increasing or monotonically decreasing, and the 
size of /' is at least ^/\T\. Since the segments in I are pairwise disjoint, I' is s-monotone. 

□ 



We describe an algorithm for computing the longest s-monotone subsequence of S. 
By Lemma 1, its size is at least Without loss of generality, we describe how 

to compute the longest increasing s-monotone subsequence; the same procedure can 
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Fig. 1. (a) Bold segments form an increasing s-monotone sequence, (b) S^j in solid (c) Si, Sj, Sk 
as in the proof of Lemma 2. 



compute the longest decreasing s-monotone subsequence of S, and we return the longer 
of the two. 

We add a segment sq to S such that it intersects y-axis below all the segments of S, 
does not intersect any segment of S, ao < ai for alH > 1, and it spans all the other 
segments of S (i.e., Z(sq) < l{si) and r(so) > 'r(si) for all 1 < z < n). We add another 
similar segment s„+i that intersects the y-axis above all the other segments in S, does 
not intersect any segment in S, and an+i > cn for all i < n. 

For 0<i<j<n+l such that Si D Sj = 0 and ai < aj, let Sij C S 
denote the subsequence of segments Sk s.t. 

(51) i < k < j, 

(52) ai <ak < CTj, 

(53) l{sk) > max{((si),l(sj)}, 

(54) Sj n Sfc = 0 and Sj fl s^ = 0. 

See Figure 1 (b) for an illustration of Sij . Let (i , j ) C Sij denote the longest increasing 
s-monotone subsequence of Sij. If there is more than one such sequence, we choose the 
lexicographically minimum one. Set</)(z, j) = |i?(z,j)|. We wish to compute tP(0, n-fl). 
Note that by dehnition of sq and s„+i, S'o(„+i) = S. 

Lemma 2. For all Q < i < j < n + 1, 

4>{i,j)= 4>{i,k)+(t>{k,j) + l. (1) 

Sk^Sij 

Proof. Let ^(z, j) = (sq^,... ,Sa^).LetSa^ be the segment in tP(z,j) with the leftmost 
left endpoint, i.e., l{sa^) < for all 1 < k < u. Note that (saj , ■ • ■ , L Sia^ 

and , ■ ■ ■ , Sau) ^ i^ahj- Since each of these two subsequences is s-monotone and 
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Sah G Sij, (p{i,ah) > ft. - 1 and (p{ah,j) > u - h. Therefore = 4>{h(^h) + 

4>{ah,j) + 1 and hence 



max (j){i,k)+(j){k,j) + l. 



Conversely, let Sk G Sij. By definition of Sik and Skj (cf. (S'1)-(S'4)), for all Sa G Sik 
and C 

(i) i<a<k<P<j, 

(tz) (7i ^ (Jq, ^ fJ/j; ^ (7 p ^ fjj, 

{in) l{sa), l{sp) > l{sk) >max{;(si), l{sj)}, 

(iv) Sq n Sfc = 0, and spdsk = 0- 

See Figure 1(c). As observed in [18], (i)-{iv) imply that the line Ik supporting Sk does 
not intersect Sq. and sp. Indeed, (z) & (ii) imply that Ik does not intersect Sq or sp to the 
right of the y-axis, and (in) & (iv) imply that Ik does not intersect Sa or sp to the left 
of the y-axis. Since Sa and sp lie on the opposite sides of Ik, they can neither intersect 
each other, nor can they intersect Si or Sj. Hence, the segments in <P{i, k) U <P{k,j) 
are pairwise disjoint. Moreover, the fact that Sa and sp do not intersect Si or Sj and 
(z)-(zw) imply that Sq and sp satisfy (S'1)-(S'4) for Sij. Hence Sik U Skj C Sij. 

Therefore the sequence {<P{i,k) o (sk) o <P{k,j)) is an s-monotone subsequence 
of Sij . Hence, 

4){hj) > (t>{hk) +(f){k,j) + 1. 

This completes the proof of the lemma. 0 



We can compute using a dynamic-programming approach. We can com- 

pute the set Sij in 0(n) time and (j>{i,j) in another 0(zz) time, assuming k) and 
4>{k,j) have already been computed. Therefore, the total time spent in computing 
4>{i, j) for all 1 < i < j < (zz -f 1) is O(n^). Putting everything together, we conclude 
the following. 

Theorem 1. Given a set S ofn segments in all intersecting a common vertical line, 
one can compute an independent set of size at least ^ k{S) in time 0(nf). 



3 A K^/^-Approximation Algorithm for Segments 

We now present a faster algorithm at the expense of a larger approximation factor. The 
algorithm again tries to find a large subset of I*{S) that has a certain special structure, 
which allows its computation in polynomial time. We assume that all the segments of S 
intersect the z/-axis and are sorted in increasing order by the x-coordinates of their left 
endpoints. Let S = {si, . . . , s„) denote this sequence. Let Ci be the y-coordinate of the 
intersection point of Si with the z/-axis. 
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Fig. 2. The four types of 1-monotone sequences described in Lemma 3. 



Lemma 3. Let I G S be a subsequence of pairwise-disjoint segments. Then there exists 
a subsequence I" = (si, . . . , Sm), such that I" C I, |/"| > |/|^/^, and it has one of 
the following properties: For all \ < i < m, 

(LI) r{si) < r(si+i) and Ci < Ci+i (Figure 2(a)), 

(L2) r{si) < r(sj+i) and Ci > c^+l (Figure 2(b)), 

(L3) r{si) > r(si+i) and Ci < Ci+i (Figure 2(c)), 

(L4) r{si) > r(si+i) and Ci > Ci+i (Figure 2(d)). 

Proof. By Dilworth’s theorem, there exists a subsequence I' G I of length at least y/f7[ 
such that the x-coordinates of the right endpoints are either monotonically increasing 
or monotonically decreasing. Again applying Dilworth’s theorem to I', one can hnd a 
subsequence I” G I' of length at least \/\I'\ > such that Ci for Si G I" are either 
monotonically increasing or monotonically decreasing. 

If /' is increasing and /" increasing (resp. decreasing), then the sequence is of type 
LI (resp. L2). If /' is decreasing and I” increasing (resp. decreasing), then the sequence 
is of type L3 (resp. L4). 0 

We refer to a subsequence of pairwise-disjoint segments of S that satisfies one of {Ll)~ 
(L4) property as an l-monotone sequence. The following property of /-monotone se- 
quences allows us to compute them efficiently. 

Lemma 4. Let S' = (si,... ,Sm) be a sequence of segments so that (i) one of the 
conditions (L1)-(L4) is satisfied, and (ii) Si fl = 0/or all i. Then S' is an l- 
monotone sequence. 

Proof. It is clear that for segments of type (L1)-(L4), two segments Si and Sj, i < 
(j — 1), cannot intersect without either Sj intersecting or sj intersecting Sj_i. 
Therefore if Sj fl s^+i = 0 for all i, then the segments are pairwise non-intersecting, and 
hence /-monotone. 0 

By Lemma 4, the segments in any sequence satisfying one of (L1)-(L4) are pairwise 
non-intersecting if the adjacent segments do not intersect (See Figure 2). 

We present an algorithm that, given a sequence S of segments, computes the longest 
/-monotone subsequence of each type. By Lemma 3, the longest of them gives an 
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independent set of size at least We describe an algorithm for computing 

the longest I -monotone subsequence of type (LI). The rest can be computed analogously. 

Define Sj to be the set of segments such that G Sj if 

(z) k < j, (ii) r{sk) < r{sj), (Hi) Ck < Cj, (iv) Sk H Sj = 0. 

Let <?(j) he the longest i-monotone subsequence of Sj of type (LI) that contains Sj. 
Set <j>{j) = \'P{j)\. We wish to compute maxi<j<„ 4>{j). 

Lemma 5. For I < j < n, 



4>{j) = max 4>{k) + 1. 

SfcGSj 

Proof. Let ^(j) = (sai,... ,Sa^=j). Clearly (sai,... ,Sa„_i) C Sa^_^ is an l- 
monotone sequence, and Sa„_i G Sj. Therefore ^(a„_i) > u — 1 and < 4>{k) + 1 
for all Sk G Sj. Conversely, Lemma 4 implies that for any Sk G Sj, the sequence 
(<P{k) o Sj) is an Lmonotone sequence. Hence, (j>{j) > (f>{k) + 1. ED 



Naively, it takes 0{n) time to compute each 4>{j), provided that 4>{k), for all 
k < j, have already been computed. This yields an O(n^) time algorithm. We now 
describe a more sophisticated approach by exploiting geometry to compute each <j>{j) 
in log° n) time. 

Let Mj = (si, . . . , Sj). We compute 4>{j) sequentially for j = 1 . . . n, maintaining 
a data structure L" that stores all the segments of Mj. Given the segment s^+i, the data 
structure returns max^^, ^ <j>{k) . Once we have computed (j>{j ) , we insert the segment 
Sj together with its weight 4>{j) into the data structure. Note that after we have inserted 
4>{j), the data structure stores all the segments in the set Mj+i. 

We now describe 'F, a three-level data structure, that stores a set E of weighted 
segments. For a query segment 7 intersecting the y-axis, it returns the maximum weight 
of a segment s in i? s.t. r(s) < r( 7 ), 7 fl s = 0 , and 7 intersects the y-axis above s. 
The first-level is a balanced binary search tree Tr{S) on the a;-coordinates of the right 
endpoints of the segments in S. Let C„ denote the “canonical” subset of segments stored 
in the subtree rooted at u G Tr{S). For each node u, the second-level data structure is a 
balanced binary search tree Tc{Cy) on {c(s) | s G C^}. Let C“ C C„ denote the set of 
segments stored in the subtree rooted at z; G Tc(C„). Finally, for the set of segments C“, 
we construct a segment-intersection data structure D(C“) as described in [1]. It stores a 
family of canonical subsets of Cy in a tree-like structure. The total size of the data structure 
is log° n), and it can be constructed in log'^ n) time. For a query segment 

7 , we can report in log° n) time the segments of C“ not intersecting 7 as a union 

of log'^ n) canonical subsets. For each canonical subset A C C“, we store the 

maximum weight wa of a segment in A. The overall data structure E can be constructed 
in time log'^ n). 

For a query segment 7 , we wish to report max(()(s), where the maximum is taken 
over all segments s of i? s.t. r(s) < r( 7 ), s fl 7 = 0 , and the intersection point of s with 
the y-axis lies below that of 7 . We query the first-level tree Ty with the right endpoint 
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of 7 and identify a set Vi of 0 (log n) nodes s.t. U«e 17 segments whose 

right endpoints lie to the left of r{'y). Next, for each m G Vi, we compute a set V 2 {u) 
of O(logn) nodes s.t. segments of C„ that intersect the y- 

axis below 7 does. For each v G V 2 (u), we compute the maximum weight Wy of a 
segment in C“ that does not intersect 7 using the third-level data structure. We return 
max„gyj '^v- The total time spent is log° n). 

Finally, we can use the standard dynamization techniques by Bentley and Saxe [4] 
to handle insertions in the data structure. Since the data structure can be constructed 
in 0(n^!^ log° n) time, the amortized insertion time is log°^^ n). However, in 

our applications, we know in advance all the segments that we want to insert - they are 
the segments of S. We set the weight of all segments to 0, and construct W on all the 
segments of S. When we wish to insert a segment, we update its weight and update the 
weight of appropriate canonical subsets at the third level of W. Omitting all the details, 
we conclude the following. 

Theorem 2. Given a set S ofn segments in all intersecting a common vertical line, 
one can compute an independent set of size at least in time log'^ n). 

4 Independent Set for Arbitrary Segments 

Let S' be a set of arbitrary segments in . We describe a recursive algorithm for com- 
puting an independent set of S. Let I be the vertical line passing through the median 
a;-coordinate of the right endpoints of segments in S, i.e., at most [n/2] segments have 
their right endpoints on each side of I . We partition S into three sets, S l , Sr and S* . Sr 
(resp. Sr) is the subset of segments that lie completely to the left (resp. right) of /, and 
S* is the subset of segments whose interiors intersect 1. 

We compute an independent set I* of S* , and recursively compute an independent set 
Ir (resp. Ir) of Sr (resp. Sr). Since the segments in 5^ do not intersect any segments in 
Sr, IrU Iris an independent set of Sr U Sr. We return either I* or Ir U Ir, whichever 
is larger. 

Suppose our algorithm computes an independent set of size at least p,{n, k), where 
K = k{S). Let Hr = \SR\,nR = \Sr\,kr = k{Sr),kr = k{Sr) and k* = k{S*). 
Suppose the algorithm for computing an independent set of S* returns a set of size at 
least C(^*)- Then 



fi{n, k) > max{^(ni, kr) + y{nR, C(k*)}, 

where kr + kr + k* > k, ur, ur < nl2, ur > kr, and ur > kr. Since p and Q are 
sub-linear functions, it can be argued that 

p.{n, k) > max{/i(n/2, r — k*), C(k*)}. 

It can be shown that the solution to the above recurrence is 
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If we can compute the independent set of S* in time t{n), then the running time of the 
algorithm is 0{t{n) log n). If t{n) > then the running time is 0{t{n)). Hence we 
conclude the following. 

Theorem 3. Let Abe a set ofm segments, all of which intersect a common vertical line. 
Suppose we can compute an independent set of A of size at least C(k(21)) in time t{m). 
Then for any set S of n segments in we can compute an independent set of size at 
least C(«;/2 log(2n/«;)) where n = k{S). The running time is 0{t{n)) ift{n) > 
and 0{t{n) log n) otherwise. 

Corollary 1. For a set S of n segments in one can compute an independent set of 
size at least (i) (/t/21og(2n/«;))^/^ in time 0{rA) and (ii) {k/2 log(2n/ in time 
0(rA^^ log'^ n). 

5 Independent Set for Convex Objects 

We now briefly describe how the results of the previous sections can be extended to find 
an independent set in a set of convex objects in R^. Let S = {si, . . . , s„} be a set of 
convex objects in the plane, and let I*{S) be a maximum independent set of S. As for 
segments, we describe an algorithm for the case in which all objects in S intersect the 
y-axis. We can then use the approach in Section 4 to handle the general case. Define 
l{si) (resp. r{si)) to be the smallest (largest) a;-coordinate of all the points p G Si. Let 
Ci be the maximum y-coordinate of the intersection of Si with the y-axis. Again assume 
that S is sorted in increasing order of the x-coordinates of the leftmost endpoints. An 
application of Dilworth’s theorem similar to Lemma 3 gives the following. 

Lemma 6. Given a set IQS of pairwise disjoint convex objects, there exists a subse- 
quence I' = (si,... , Sm), where \I'\ > and I' has one of the following structure: 

For alll < i < m 

(Cl) r(si) < r(si+i) and Ci < Cj+i (Figure 3(a)), or 
(C2) r{si) < r(si+i) and Ci > Cj+i (Figure 3(a)), or 
(C3) r (si) > r (si+i ) (Figure 3(b)). 

Sequences satisfying condition (Cl) or (C2) can be computed using a dynamic 
programming approach similar to the one in Section 3. We outline the algorithm for 
computing longest subsequences of type (C3). 

Fori < i < j < n such that SiflSj = 0,^5^ C S' denote the subsequence of segments 
Sfe s.t. (i) Ci < Ck < Cj, (ii) m.a,x{l(si),l(sj)} < l(sk) < r(sk) < min{r(si), r(sj)}, 
and (Hi) Si (1 Sk = ^ and sj fl = 0. See Figure 3(c). Let d>{i,j) Q Sij denote the 
longest subsequence of type (C3) of Sij. Set (f>(i, j) = \I>{i, j)\. Then we can prove the 
following. 

Lemma 7. For all \ < i < j < n, 

f(i,j)= ma.x (j>{i,k)+(j){k,j) + l. 

Sk^Oij 



(2) 
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We can compute (j){i, j) using a dynamic-programming approach. Assuming k) and 
4>{k,j) have already been computed, we only need to compute the set Sij. Suppose we 
can preprocess S in time r(S') so that we have the following information at our disposal: 
(PI) l{si),r{si),Ci for each Si G S, (P2) whether Si fl Sj =0 for each Sj, Sj G S. 
Then the set Sij can be computed in time 0{n). Hence, we can compute in 

0{n^ + t{S)) time. Plugging this procedure into the recursive scheme of Section 4 we 
obtain the following. 

Theorem 4. Let S be a set ofn convex objects in so that (P\)-(P2) can be com- 
puted in t(S') time. Then an independent set of size at least (k/ 2 log(2n/«:))^/^ can be 
computed in time 0(n^ + r(S')). 




Fig. 3. (a) Sequences of type (Cl) (bold) and (C2) (dashed) (b) Sequence of type (C3), (c) Sij 
(solid) 



6 Conclusions 

In this paper we have presented algorithms for approximating the maximum independent 
set in the intersection graphs of convex objects in the plane. The approximation ratio is 
better for the case where the convex objects are line segments. 

The overall structure of the algorithms is: given a set of disjoint objects, first show 
that there exists a large subset with some special (separator-like) property. Then show 
that this subset can be computed exactly from amongst the entire set (we used dynamic- 
programming). One abstract approach towards improving these results is to only ap- 
proximate the subset in the second step, instead of computing it exactly. This might 
allow one to relax the required properties, thereby increasing its size and improving the 
approximation ratio. 

We leave it as an open problem whether the approximation ratios can be improved. In 
particular, is it possible to design a approximation algorithm for the case of general 
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convex objects (all intersecting a vertical line)? Similarly, is it possible to approximate 
the independent set of line segments better than ^/k. For axis-parallel rectangles, devising 
an algorithm with approximation ratio o(log n) remains an intriguing open problem. 
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Abstract. Let A and B be two sets of n resp. m disjoint unit disks 
in the plane, with m > n. We consider the problem of finding a trans- 
lation or rigid motion of A that maximizes the total area of overlap 
with B. The function describing the area of overlap is quite complex, 
even for combinatorially equivalent translations and, hence, we turn our 
attention to approximation algorithms. We give deterministic (1 — e)- 
approximation algorithms for translations and for rigid motions, which 
run in 0{{nm/e^) log(m/e)) and je^) logm)) time, respectively. 

For rigid motions, we can also compute a (1 — e)-approximation in 0((m^ 
j.^4/3^1/3/^3) log log time, where A is the diameter of set A. Under 
the condition that the maximum area of overlap is at least a constant 
fraction of the area of A, we give a probabilistic (1 — e)-approximation al- 
gorithm for rigid motions that runs in j t^') log(m/e) log^ m) time. 

Our results generalize to the case where A and B consist of possibly 
intersecting disks of different radii, provided that (i) the ratio of the 
radii of any two disks in A U B is bounded, and (ii) within each set, the 
maximum number of disks with a non-empty intersection is bounded. 
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1 Introduction 

Shape matching is a fundamental problem in computational geometry with ap- 
plications in computer vision: given two shapes A and B and a distance measure, 
one wants to determine a transformation of A — such as a translation or a rigid 
motion — that minimizes its distance to B. Typical problems include: match- 
ing point sets with respect to the Hausdorff or bottleneck distance and matching 
polygons with respect to the Hausdorff or Frechet distance between their bound- 
aries; see Alt and Guibas [4] for a survey. The area of overlap of two polygons 
is less sensitive to noise than the Hausdorff or Frechet distance between their 
boundaries [8,3] and therefore more appropriate for certain applications. 

Mount et al. [11] showed that the function of the area of overlap of two sim- 
ple polygons, with n and m vertices, under translation is continuous and has 
0{{nm)^) pieces, with each piece being a polynomial of degree at most two. 
A representation of the function can be computed in 0((nm)^) time. No algo- 
rithm is known that computes the translation that maximizes the area of overlap 
and does not compute the complete representation of the overlap function. One 
of the open problems mentioned by Mount et al. was to give efficient match- 
ing algorithms for objects with curved boundaries. De Berg et al. [8], gave an 
0((n -|- m) log(n -I- m)) algorithm for determining the optimal translation for 
convex polygons, while Alt et al. [3] gave a constant-factor approximation of the 
minimum area of symmetric difference of two convex shapes. 

We study the following problem: given two sets A and B of disks in the plane, 
we would like to find a rigid motion that maximizes the area of overlap. Our 
main goal is to match two shapes, each being expressed as a union of disks; thus 
the overlap we want to maximize is the overlap between the two unions (which is 
not the same as the sum of overlaps of the individual disks) . In the most general 
setting we assume the following: (i) the largest disk is only a constant times 
larger than the smallest one, and (ii) any disk in A intersects only a constant 
number of other disks in A, and the same holds for B. 

Since any two- or three-dimensional shape can be efficiently approximated 
by a finite union of disks or balls — see, for example, the works by O’Rourke and 
Badler [12] and Amenta and Kolluri [5] — our algorithms can be used to match 
a variety of shapes. Ranjan and Fournier [13] also used the union of disks or 
spheres representation to interpolate between two shapes. The assumptions (i) 
and (ii) above will often be satisfied when disks or balls are used to approximate 
objects, although the constant in assumption (i) may become large when the ap- 
proximated objects have fine details. Moreover, both assumptions make perfect 
sense in molecular modelling with the hard sphere model [10]. Under this model 
the radii range of the spheres is fairly restricted and no center of a sphere can be 
inside another sphere; a simple packing argument shows that the latter implies 
assumption (ii) . A related problem with applications in protein shape matching 
was examined by Agarwal et al. [1], who gave algorithms for minimizing the 
Hausdorff distance between two unions of discs or balls under translations. 

Another application comes from weighted point-set matching. Consider a 
two- or three-dimensional shape that is reduced to a set of descriptive feature 
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points, each of them weighted relatively to its importance. For example, a curve 
or a contour can be reduced to a set of points of high curvature. We can assign 
to each point a ball centered at it with radius relative to its weight. Thus, the 
two shapes are represented as unions of balls and a possible measure of their 
similarity is the area of overlap of the two unions. 

Recently, Cheong et al. [6] gave an almost linear, probabilistic approximation 
algorithm that computes the maximum area of overlap under translations up to 
an absolute error with high probability. When the maximum overlap is at least a 
constant fraction of the area of one of the two sets, the absolute error is in fact a 
relative error. This is usually good enough for shape matching, since if two shapes 
are quite dissimilar we usually do not care about how bad the match exactly 
is. A direct application of the technique of Cheong et al. to rigid motions gives 
an 0((m^/e®) log(m/e) log^ m) time algorithm that requires the computation of 
intersection points of algebraic curves, which is not very practical. 

Our contributions are the following. First, we show in Section 2 that the 
maximum number of combinatorially distinct translations of A with respect 
to B can be as high as 0{n?m). When rotations are considered as well, the 
complexity is 0{n^m^). Moreover, the function describing the area of overlap is 
quite complex, even for combinatorially equivalent placements. Therefore, the 
focus of our paper is on approximation algorithms. Next, we give a lower bound 
on the maximum area of overlap under translations, expressed in the number of 
pairs of disks that contribute to that area. This is a vital ingredient of almost 
all our algorithms. 

In the remaining sections, we present our approximation algorithms. For the 
sake of clarity we describe the algorithms for the case of disjoint unit disks. 
It is not hard to adapt the algorithms to sets of disks satisfying assumptions 
(i) and (ii) above. Due to lack of space, the necessary changes for the latter, 
and several other proofs, are omitted; these can be found in the full version of 
this paper. For any e > 0, our algorithms can compute a (1 — e)-approximation 
of the optimum overlap. For translations we give an algorithm that runs in 
0((nm/e^) log(m/e)) time. This is worse than the algorithm of Cheong et al., 
but our algorithm is deterministic and our error is always relative, even when 
the optimum is small. It also forms an ingredient to our algorithm for rigid 
motions, which runs in 0((n^m^/e^) log m) time. If A is the diameter of set 
A — recall that we are dealing with unit disks — the running time of the lat- 
ter becomes 0{{rn?rA ^^ log n log m), which yields an improvement when 
A = o(n^/ log^ n). Note that in many applications the union will be connected, 
which implies that the diameter will be 0{n). If the area of overlap is a constant 
fraction of the area of the union of A, we can get a probabilistic algorithm for 
rigid motions that runs in 0((m^/e^) log(m/e) log^ to) time and succeeds with 
high probability. 




Maximizing the Area of Overlap of Two Unions of Disks 



141 



2 Basic Properties of the Overlap Function 

Let A = {Ai, . . . , An} and B = {Bi, . . . , B^}, be two sets of disjoint unit disks 

in the plane, with n < m. We consider the disks to be closed. Both A and B 

lie in the same two-dimensional coordinate space, which we call the work space; 
their initial position is denoted simply by A and B. We consider B to be fixed, 
while A can be translated and/or rotated relative to B. 

Let I be the infinite set of all possible rigid motions — also called isometries — 
in the plane; we call I the configuration space. We denote by R 0 a rotation 
about the origin by some angle 0 G [0, 27 t) and by Tp a translation by some 

t G It will be convenient to model the space [0,27 t) of rotations by points 

on the circle S^. For simplicity, rotated-only versions of A are denoted by 
A{6) = {Ai{0), . . . ,An{9)}. Similarly, translated-only versions of A are denoted 
by A{i) = ,A„(t)}. Any rigid motion I G I can be uniquely de- 

fined as a translation followed by a rotation, that is, / = Ipg = Rg o Tp, for 
some 9 G and t G Alternatively, a rigid motion can be seen as a ro- 
tation followed by some translation; it will be always clear from the context 
which definition is used. In general, transformed versions of A are denoted by 
A{t, 9) = {Ai{t, 9),. . . , An(t, 0)} for some Ipg G I. 

Let Int(C'),U(C) be, respectively, the interior and area of a compact set 
C G and let Vij{t,9) = V{Ai{t,9) ABj). The area of overlap of A{t,9) and 
B, as t,9 vary, is a function V : I — >■ K with V{t,9) = V{{[JA{t,9)) IT (U-S)). 
Thus the problem that we are studying can be stated as follows: Given two sets 
A, B, defined as above, compute a rigid motion Ip ^ g ^ that maximizes V(t, 9). 

Let dij{t, 9) be the Euclidean distance between the centers of Ai{t, 9) and Bj. 
Also, let ri be the Euclidean distance of Afs center to the origin. For simplicity, 
we write dij(t) when 9 is fixed and V{9),Vij{9),dij{9) when t is fixed. 

The Minkowski sum of two planar sets A and B, denoted hy A® B, is the set 
{Pi A P 2 '■ Pi & A,P 2 G B}. Similarly the Minkowski difference Aq B is the set 
{pi -P2 : Pi G A,P2 G B}. 

Theorem 1. Let A he a set of n disjoint unit disks in the plane, and B a set 
of m disjoint unit disks, with n <m. The maximum number of comhinatorially 
distinct placements of A with respect to B is 0{n^m) under translations, and 
0{n^wf) under rigid motions. 

This theorem implies that explicitly computing the subdivision of the config- 
uration space into cells with comhinatorially equivalent placements is highly 
expensive. Moreover, the computation for rigid motions can cause non-trivial 
numerical problems since it involves algebraic equations of degree six [4]. Fi- 
nally, the optimization problem in a cell of this decomposition is far from easy: 
one has to maximize a function consisting of a linear number of terms. There- 
fore we turn our attention to approximation algorithms. The following theorem, 
which gives a lower bound on the maximum area of overlap, will be instrumental 
in obtaining a relative error. 
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Theorem 2. Let A = {Ai,... ,A„} and B = {Bi,... ,Bm} be two sets of 
disjoint unit disks in the plane. Let topt be the translation that maximizes the 
area of overlap V{t) of A{i) and B over all possible translations t of set A. Lf 
kopt is the number of overlapping pairs Ai{topt),Bj, then V(iopt) is O{kopt)- 

Proof. First, note that V(topt) < kopt'n’. Since we are considering only trans- 
lations, the configuration space is two-dimensional. For each pair Ailfopt), Bj 
for which Aiftopt) H Bj yf 0, we draw, in configuration space, the region of 
translations /C^ that bring the center of Ailfopt) into Bj] see Figure 1. Such a 




Fig. 1. All disks ICij are confined within a disk TZ of area 9tt. 



region is a unit disk that is centered at a distance at most 2 from topt. Thus, 
all regions ICij are fully contained in a disk TZ, centered at topt, of radius 3. 
By a simple volume argument, there must be a point tf^ G TZ (which repre- 
sents a translation) that is covered by at least kopt/^ disks /C^. Each of the 
corresponding pairs Ai{tfi),Bj has an overlap of at least 27 t/ 3 — -\/3/2. Thus, 
V(fopt) > V(f#) > (2^27 - V3/18)kopf □ 

3 A (1 — e) -Approximation Algorithm for Translations 

Theorem 2 suggests the following simple approximation algorithm: compute the 
arrangement of the regions ICij, with i = 1, . . . ,n and j = 1, . . . , m, and pick 
any point of maximum depth. Such a point corresponds to a translation t that 
gives a constant-factor approximation. An analysis similar to that of Theorem 2 
leads to a factor of (2/3 — -\/3/27r) « 0.39. It is possible to do much better, 
however. 

We proceed, first with a deterministic (1 — e)-approximation algorithm. Since 
both A and B consist of disjoint disks, we have V(t) = gA s gb 
algorithm is based on sampling the configuration space by using a uniform grid. 
This is possible due to the following lemma that implies that, in terms of absolute 
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error, it is not too bad if we choose a translation which is close to the optimal 
one. 

Lemma 1. Let k he the number of overlapping pairs Ai{t,0), Bj for some t G 
9 G [0, 27t). For any given 5 > 0 and any if G for which \t — tf\ = 0{6), 
we have V{f! ,9) = V(t, 9) — 0{kS) . 

Instead of computing V{t) for every grid translation directly, we use a sim- 
ple voting scheme that speeds up the algorithm by a linear factor. Algorithm 
Translation is given in Figure 2. 



Translation(A, B, e): 

1. Initialize an empty binary search tree S with entries of the form {t,V{f)) where t 
is the key. Let G be a uniform grid of spacing ce, where c is a suitable constant. 

2. For each pair of disks Ai G A and Bj G B do: 

a) Determine all grid points tg of G such that tg G Tij, where = Bj 0 Ai. 
For each such tg do: 

- If tg is in 5, then V{tg) := V{tg) + Vij{tg); 
otherwise, insert tg in S with V{tg) := Vij{tg). 

3. Report the grid point t^px that maximizes V{tg). 



Fig. 2. Algorithm Translation(A, B, e). 



Theorem 3. Let A = {Ai, . . . , A„} and B = {Bi, . . . ,Bm} he two sets of 
disjoint unit disks in the plane. Let topt he the translation that maximizes V{t). 
Then, for any given e > 0, Translation(A, B, e) computes a translation t^px, 
for which V{tapx) > (1 — e)V(topt); in 0{(jnn/e^)log{m/e)) time. 

Proof. It follows from Theorem 2 that there must be at least one pair with 
significant overlap in an optimal translation. This implies that the grid point 
closest to the optimum must have a pair of disks overlapping, and so the algo- 
rithm checks at least one grid translation t^ for which \topt ~ tg\ = 0{e). Let 
kopt be the number of overlapping pairs Ailfopt), Bj. According to Lemma 1, by 
setting 6» = 0, we have that V(fopt) - V{tapx) < V{topt) ~ V{tg) = O{kopte)- By 
Theorem 2, we have V(topi) = O{kopt), and the approximation bound follows. 

The algorithm considers 0(l/e^) grid translations tg per pair of disks. Each 
translation is handled in 0(log(nm/e^)) time. Thus, the total running time is 
0((nm/e^) log(nm/e^)) = 0((nm/e^)) log(m/e)(. □ 

4 The Rotational Case 

This section considers the following restricted scenario: set B is fixed, and set A 
can be rotated around the origin. This will be used in the next section, where 
we consider general rigid motions. 
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Observe that this problem has a one-dimensional configuration space: the an- 
gle of rotation. Consider the function V : [0, 2tt) — >■ M with V{0) := C((1J A{6)) 0 
(U^)) = J2AieA,BjGB Vij{9). For now, our objective is to guarantee an absolute 
error on V rather than a relative one. We start with a result that bounds the 
difference in overlap for two relatively similar rotations. Recall that is the 
distance of Ai’s center to the origin. 



Rotation(R, B, 5): 

1. For each pair of disks Ai € A and Bj £ B, choose a set Oij . . ,0^^} 

of rotations as follows. First put the midpoint of Rij in &ij, and then put all 
rotations in &ij that are in Rij and are at distance k ■ 5/{2ri) from the midpoint 
for some integer k. Finally, put both endpoints of Rij in &ij. 

2. Sort the values 0 := IJ^ j 0ij, keeping repetitions and solving ties arbitrarily. Let 
Oo,9i, ■ ■ ■ be the ordering of 0. In steps 3 and 4, we will compute a value V{0) for 
each 9 £ 0. 

3. a) Initialize V(6^o) := 0. 

b) For each pair Ai £ A, Bj £ B for which 9q £ Rij do: 

— If Vij is decreasing at 9q, or 6q is the midpoint of Rij, then V(6^o) := 
V{9o) + Vij{9ij), where 9ij is the closest value to 9o in 0ij with 6ij > 9 q. 
— If Vij is increasing at 9o, then V(9o) := V(9o) + Vij where 9ij is the 
closest value to 9o in 0ij with 9ij < 9 q. 

4. For each 0; in increasing order of I, compute V{9i) from V{9i-i) by updating the 
contribution of the pair Ai, Bj defining 6i, as follows. Let 6i be the s-th point in 
0ij, that is, 6i — 9fj 

- If Vij is increasing at 9fj, then V{6i) := V(6'i_i) — Vij{9"~^) + Vij{9‘j) 

- If Vij is the midpoint of Rij, then V{9i) := V(6l,_i) - Vp(6»l‘-i) 

- If Vij is decreasing at O-j, then V{9i) := V{6i-i) — Vij{9fj) + 

5. Report the 9apx € 0 that maximizes V{9). 



Fig. 3. Algorithm Rotation(A, B, 5). 



Lemma 2. Let Ai, Bj be any fixed pair of disks. For any given 5 > 0 and any 
9 i, 92 for which \0i — 02\ < S/{2ri), we have \Vij{9i) — Vij( 6 * 2 )| < 25. 

For a pair Aj, Bj, we define the interval Rij = {9 £ [0, 27 t) : Ai{9) fl Bj yf 0} 
on S^, the circle of rotations. We denote the length of Rij by \Rij\. Instead of 
computing Vij (9) at each 9 £ Rij, we would like to sample it at regular intervals 
whose length is at most 5/{2ri). At first, it looks as if we would have to take an 
infinite number of sample points as — >■ oo. However, as the following lemma 

shows, \Rij\ decreases as increases, and the number of samples we need to 
consider is bounded. 

Lemma 3. For any Ai,Bj with ri > 0, and any given given 5 > Q, we have 
\R,,\l{5l{2ri)) = 0{\/5). 
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This lemma implies that we have to consider only 0{\/S) sample rotations 
per pair of disks. Thus we need to check 0{nm/S) rotations in total. It seems 
that we would have to compute all overlaps at every rotation from scratch, but 
here Lemma 2 comes to the rescue: in between two consecutive rotations 9, 9' 
defined for a given pair Ai,Bj there may be many other rotations, but if we 
conservatively estimate the overlap of Ai , Bj as the minimum overlap at 9 and 
9' , we do not loose too much. In Figure 3, algorithm Rotation is described in 
more detail; the value V{9) is the conservative estimate of V(0), as just explained. 

Lemma 4. Let 9opt be a rotation that maximizes V{9) and let kopt he the num- 
ber of overlapping pairs Ai{9opt), Bj . For any given <5 > 0, the rotation 9apx 
reported by Rotation (14, R, satisfies V{9opt) — V{9apx) = 0{koptS) and can 
he computed in 0{{mn/ 6) log m) time. 

5 A (1 — e) -Approximation Algorithm for Rigid Motions 

Any rigid motion can be described as a translation followed by a rotation around 
the origin. This is used in algorithm RigidMotion described in Figure 4, which 
combines the algorithms for translations and for rotations to obtain an (1 — e)- 
approximation for rigid motions. 



RigidMotion(A, B, e): 

1. Let G be a uniform grid of spacing ce, where c is a suitable constant. For each 
pair of disks Ai £ A and Bj G B do: 

a) Set the center of rotation, i.e. the origin, to be Bj’s center by translating B 
appropriately. 

b) Let Tij = Bj 0 Ai, and determine all grid points tg of G such that tg € Tij. 
For each such tg do: 

— run ROTATlON(A(tg), B, cT), where c' is an appropriate constant. Let 
9^px be the rotation returned. Compute V{tg,9apx)- 

2. Report the pair {tapx,9apx) that maximizes V{tg,6^p,^). 



Fig. 4. Algorithm RigidMotion(A, B, e). 



Theorem 4. Let A = {Ai, . . . ,A„} and B = {Bj,... ,Bm}; with n < m, 
be two sets of disjoint unit disks in the plane. Let be a rigid motion 

that maximizes V{t,9). Then, for any given e > 0, RigidMotion(A, B, e) com- 
putes a rigid motion Ipapx Bapx ^'^^h that V{tapx,9apx) > (1 — e)V(topti9opt) in 
0((n^m^/e^) logm) time. 

Proof. We will show that V{tapx,dapx) approximates V(fopt:^opt) up to an ab- 
solute error. To convert the absolute error into a relative error, and hence show 
the algorithm’s correctness, we use again Theorem 2. 
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Let Aopt be the set of disks in A that participate in the optimal solution 
and let \Aopt\ = kopt- Since the ‘kissing’ number of unit open disks is six, we 
have that kopt < Qkopt, where kopt is the number of overlapping pairs in the 
optimal solution. Next, imagine that RiGiDMOTiON(ylopi, i3, e) is run instead 
of RigidMotion(^, 5, e). Of course, an optimal rigid motion for Aopt is an 
optimal rigid motion for A and the error we make by applying a non-optimal 
rigid motion to Aopt bounds the error we make when applying the same rigid 
motion to A. 

Consider a disk A^ G Aopt and an intersecting pair Ai(topt,0opt), Bj- Since, 
at some stage, the algorithm will use Bj’s center as the center of rotation, and 
^ioptfiopt = we have that Ai{topt)ABj yf 0 if and only if Ai{topt, dopt)A 

Bj yf 0. Hence, we have that topt G Tij and the algorithm will consider some 
grid translation tg G T^- = Bj 0 At, for which |fopt ~ tg\ = 0(e). By Lemma 1 
we have \^{toptT 0opt) 0opt) — O(kopt^) — O(koptk) . 

Let 9opt be the optimal rotation for tg. Then, V(tg,9opt) < ^opt)- The 
algorithm computes, in its second loop, a rotation 9^^^ for which V(tg,9^p^) — 
V(tg, 9lp^) = 0(kopA)i where kopt is the number of pairs at the optimal rotation 
of Aopt(tg). Since we are only considering Aopt we have that kopt < ^kopt, 
thus, V(tg, 9^pt) - V(tg, 9lp^) = O(kopte). 

Now, using the fact that V(tg, 9^^^) < V(tapx, 9apx) and that kopt < kopt, and 
putting it all together we get V(topt,9opt) ~ V(tapx,9apx) = O(koptf)- Since the 
optimal rigid motion can be also defined as a rotation followed by some transla- 
tion, Theorem 2 holds for V(topt, 9opt) as well. Thus, V(topt, 9opt) = O(kopt) and 
the approximation bound follows. 

The running time of the algorithm is dominated by its first step. We can 
compute V(tg,9^p^) by a simple plane sweep in 0(m log m) time. Since there are 
6*(e“^) grid point in each Tij, each execution of the loop in the first step takes 
0(m -I- 1/e^ -I- (l/e^)(nm/e) logm -I- (l/e^)mlogm) = O ((nm/e^) log m) time. 
This step is executed nm times, hence the algorithm runs in 0((rAm^ je^) logm) 
time. □ 



5.1 An Improvement for Sets with Small Diameter 

We can modify the algorithm such that its running time depends on the diameter 
A of set A. The main idea is to convert our algorithm into one that is sensitive 
to the number of pairs of disks in A and B that have approximately the same 
distance, and then use the combinatorial bounds by Gavrilov et al. [9]. This, 
and a careful implementation of Rotation, allows us to improve the analysis of 
the running time of RigidMotion for small values of A. In many applications 
it is reasonable to assume bounds of the type A = 0(n) [9], and therefore the 
result below is relevant; in this case, this result shows that we can compute a 
(1 — e)-approximation in lognlogm) time. 

Theorem 5. Let A = {Ai , . . . , A„} and B = {Bi , . . . , Bm},n < m be two sets 
of disjoint unit disks in the plane. Let A he the diameter of A, and let be 
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the rigid motion maximizing V{t,6). For any e > 0, we can find a rigid motion 
such that V(tapx,0apx) > (l-e)V(fopt, Oopt) in / e^) log n 

log to) time. 

6 A Monte Carlo Algorithm for Rigid Motions 

In this section we present a Monte Carlo algorithm that computes a (1 — el- 
approximation for rigid motions in 0((TO^/e^) log(TO-/e) log^ to) time. The algo- 
rithm works under the condition that the maximum area of overlap of A and B 
is at least some constant fraction of the area of A. 

The algorithm is simple and follows the two-step framework of Section 5 in 
which an approximation of the best translation is followed by an approximation 
of the best rotation. However, now, the first step is a combination of grid sam- 
pling of the space of translations and random sampling of set A. This random 
sampling is based on the observation that the deterministic algorithm of Sec- 
tion 5 will compute a ( 1 — e)-approximation kept times, where kopt is the number 
of pairs of overlapping disks in an optimal solution. Intuitively, the larger this 
number is, the quicker such a pair will be tried out in the first step. Similar obser- 
vations were made by Akutsu et al. [2] who gave exact Monte Carlo algorithms 
for the largest common point set problem. 

The second step is based on a direct application of the technique by Cheong 
et al. that allows us to maximize, up to an absolute error, the area of overlap 
under rotation in almost linear time, by computing a point of maximum depth 
in a one dimensional arrangement. 

Rotations. For a given e > 0, we choose a uniform random sample S of points 
in A with [S'! = 6>(e“^logTO). For a point s G S', we define W{s) = {0 G 
[0, 27 t)|s( 6 *) G B} where s{9) denotes a copy of s rotated by 9. Let &b{S) be the 
arrangement of all regions IF(s),s G S; it is a one-dimensional arragement of 
unions of rotational intervals. 

Lemma 5. Let 9 opt be the rotation that maximizes V{9). For any given e > 0, 
let S be a uniform random sample of points in A with |S| > ci where c\ 
is an appropriate constant. A vertex 9 apx ofOsiS) of maximum depth satisfies 
V{9opt) — V{9apx) < A/ {A) with probability at least 1 — Ifm^. 

Proof. The proof is very similar to the proofs of Lemma 4.1 and Lemma 4.2 by 
Cheong et al. [6]. □ 

Note that Ob{S) has 0((m/e‘^)logm) complexity and can be computed in 
0{{m/e^) \og{m/e) logm) time by sorting. A vertex 9apx of 0b{S) of maximum 
depth can be found by a simple traversal of this arrangement. 

We could apply the idea above directly to rigid motions and compute the ar- 
rangement of all regions W (s) with respect to rigid motions of S. Lemma 5 holds 
for this arrangement, and a vertex of maximum depth gives an absolute error on 
V{topt,9opt). This arrangement has 0(|SpTO^) = 0((TO^/e®) log^ to) vertices [7] 
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that correspond — in workspace — to combinations of triples of points in S and 
triples of disks in B such that each point lies on the boundary of a disk. All such 
possible combinations can be easily found in 0((m^/e®) log(m/e) log^ m) time. 
However, computing the actual rigid motion for any such combination is not triv- 
ial, as already explained in section 2. This complication is avoided by applying 
the technique to rotations only, thus computing a one-dimensional arrangement 
instead. 

Rigid motions. Since we assume that V{topt,dopt) > aV{A), for some given 
constant 0 < a < 1, we have that kopt > ctn. Based also on the fact that the 
number of disks in A that participate in an optimal solution is at least kopt/^, we 
can easily prove that the probability that 6>(a“^logm) uniform random draws 
of disks from A will all fail to give a disk participating in an optimal solution is 
at most 1/m®. Algorithm RandomRigidMotion is given in Figure 5. 



RandomRigidMotion( A, B,a,e): 

1. Choose a uniform random sample S of points in A, with |S'| = 0{e~^ logm). 

2. Let G be a uniform grid of spacing ce, where c is a suitable constant. 

Repeat G(a“^logm) times: 

a) Choose a random Ai from A. 

b) For each Bj G B do: 

i. Set the center of rotation, i.e. the origin, to be Bj’s center by translating 
B appropriately. 

ii. Let Tij — BjQAi, and determine all grid points tg of G such that tg G Tij. 
For each such tg do: 

— Compute a vertex 6apx of maximum depth in 0B{S{tg)), and 

3. Report the pair {tapx,dapx) that maximizes V{tg,9apx)- 



Fig. 5. Algorithm RandomRigidMotion(A, B, a, e). 



Theorem 6. Let A = {Ai,... , A„} and B = {Bi,... ,Bm}, be two sets of 
disjoint unit disks in the plane and If a be a rigid motion that maximizes 

^ ^opti^opt ^ 

V{t,9). Assume that V{topt,9opt) > aV{A), for some given constant 0 < a < 1. 
For any given e > 0, RandomRigidMotion(A, R, a, e) computes a rigid mo- 
tion such that V{tapx^ Oapx) > (l-e)V(fopt,6»opt) in 0((m^/e‘‘)log(m/e) 

log^ m) time. The algorithm succeeds with probability at least 1 — 2jm^. 

7 Concluding Remarks 

We have presented approximation algorithms for the maximum area of overlap 
of two sets of disks in the plane. Theorem 2 on the lower bound on the maximum 
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area of overlap generalizes to three dimensions in a straightforward way. The ap- 
proximation algorithm for translations generalizes as well, in the following way: 
the arrangement of n spheres (under the assumptions (i) and (ii) of Section 1) 
has 0{n) complexity and can be computed in O(nlogn) time [10]. In addition, 
there exists a decomposition of this arrangement into 0(n) simple cells that can 
be computed in 0(n log n) time [10]. By using these cells in the voting scheme, 
the running time of the algorithm is 0{{mn/e^)log{mn/e)). 

Although our algorithms for rigid motions generalize to 3D, their running 
times increase dramatically. It would be worthwhile to study this case in detail, 
refine our ideas and give more efficient algorithms. 
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Abstract. This paper gives optimal algorithms for the construction of 
the Nearest Neighbor Embracing Graph (NNE-graph) of a given set of 
points V of size n in the fc-dimensional space (fc-D) for k = (2,3). The 
NNE-graph provides another way of connecting points in a communica- 
tion network, which has lower expected degree at each point and shorter 
total length of connections than Delaunay graph. In fact, the NNE-graph 
can also be used as a tool to test whether a point set is randomly gener- 
ated or has some particular properties. 

We show in 2-D that the NNE-graph can be constructed in optimal 0{n^) 
time in the worst case. We also present an 0{n\ogn + nd) algorithm, 
where d is the Q{log largest degree in the output NNE-graph. The 
algorithm is optimal when d = O(logn). The algorithm is also sensitive 
to the structure of the NNE-graph, for instance when d — g ■ (logn), 
the number of edges in NNE-graph is bounded by 0{gn\ogn) for 1 < 

9 < k>f^- We finally propose an 0{nlogn + nd log d*) algorithm for 
the problem in 3-D, where d and d* are the f2{ largest vertex 

degree and the largest vertex degree in the NNE-graph, respectively. The 
algorithm is optimal when the largest vertex degree of the NNE-graph 
d* is 0 (r^ 2 p_). 

1 Introduction 

The Nearest Neighbor Embracing Graph (NNE-graph for short) of a given set 
of points V in k-D is constructed as follows (throughout this paper, Euclidean 
distance is assumed): for each point p in V, we connect p to its first nearest 
neighbor x\, its second nearest neighbor X 2 , and so on by edges until p is con- 
tained in the convex hull of these nearest neighbors. An NNE-graph thus consists 
of the vertex set V and the edge set of all these connecting edges. The NNE- 
graph was first introduced in [4] and has very interesting properties. If the set 
of points V is generated by a Poisson process in a plane (i.e., 2-D), it is proved 



T. Hagerup and J. Katajainen (Eds.): SWAT 2004, LNCS 3111, pp. 150—160, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 




Construction of the Nearest Neighbor Embracing Graph of a Point Set 151 



that the expected degree of a typical point in the NNE-graph is 5, in contrast 
with the expected degree of 6 of a typical point in the Delaunay triangulation 
[3,7] of V. Thus, the NNE-graph provides another way of connecting points in 
a communication network, which has lower expected degree at each point and 
shorter total length of connections. On the other hand, the NNE-graph can also 
be used as a tool to test whether a point set is randomly generated or has some 
particular properties. If the expected degree of the NNE-graph of a set of points 
V deviates from 5, it suggests that the point set V exhibits some regularity (if 
it is lower than 5) or some clustering (if it is higher than 5) [1]. 

In this paper, we present a worst-case optimal algorithm and two improved 
algorithms for finding the respective NNE-graphs of point sets in 2-D and 3- 
D. The algorithms are based on three key techniques: exponential search based 
on selection algorithm in [6], fast fc-nearest-neighbor algorithm based on well- 
separated pair decomposition in [2], and fast point-in-convex- hull test based on 
projection. 

In Section 2, we show that the worst case lower bound for construction of the 
NNE-graph of a given set of points V of size n is at least f2{n'^) and the brute 
force method of construction has a time complexity of 0{n^log n). In Section 3, 
we present a worst-case optimal 0{n^) time algorithm and an 0{nlogn+nd) im- 
proved algorithm, where d is the l7(logn)*^ largest vertex degree in NNE-graph. 
That is, d is the (logn)*^ largest vertex degree or higher in the NNE-graph. This 
algorithm is optimal when d = O(logn). The optimality is because a simpler 
problem: finding the closest pair of n points in 2-D, which takes f?(nlogn) time, 
can be transformed to the corresponding NNE-graph problem. In Section 4, we 
extend the problem into 3-D and propose an 0(nlogn -I- nd(logd*)) algorithm, 
where d* and d are the largest vertex degree and the largest vertex 

degree in the largest vertex degree in the NNE-graph, respectively. 

The algorithm is optimal when d* = Q( iog°ogn )’ Section 5, we give some 
concluding remarks. 

1.1 Lower Bound and the Brute Force Approach 

Lemma 1. The time complexity of any algorithm for constructing the NNE- 
graph of a given set of points V of size n requires at least I7(n^) in the worst 
case. 

Proof. Consider the following point set V of size n: n — 1 of the points lay on a 
convex curve facing a distant single point v, with the distance between any two 
points on the curve shorter than the distance between v and its nearest neighbor 
on the curve (refer to Figure 1). 

In this scenario, it is easy to see that for any point p on the curve, the NNE- 
graph of V has to include the edge from p to u as well as the edges from p 
to all the rest of points on the curve. Thus, the NNE-graph must have 
edges. By this example, the time complexity of any algorithm for constructing 
the NNE-graph requires at least C(n^) in the worst case. □ 




152 



M.Y. Chan et al. 




Fig. 1. The worst case scenario 

The brute force method for constructing the NNE-graph can be done as 
follows. For each point p in V, sort the points in V according to their distances 
from p and then check, for successive k starting at 2, whether p is contained 
inside the convex hull of {xi,X2, ...,Xk}, where Xk is the nearest neighbor of 
p, and stop when p lies inside. This brute force method has a time complexity 
of 0{n^ log n). 

2 Fast Algorithms for Finding the NNE-Graph of a Point 
Set in 2-D 

In this section, we present an 0{nlogn + dn) time algorithm for computing the 
NNE-graph on an input set F of n points in 2-D, where d is at least the (logn)*^ 
largest vertex degree or higher in the output NNE-graph (i.e., I2(logn)‘^). Our 
algorithm is based on the ideas of exponentially searching the nearest neighbors 
for each input point and of testing the point-in-convex hull of these nearest 
neighbors, simply by maintaining a proper convex hull on the boundary of a 
unit circle. To illustrate the main ideas and key operations of our approach, we 
first describe a preliminary 0{n?) time algorithm; this algorithm is also used 
as a subroutine by the final solution. We then give the 0(n log n -|- dn) time 
algorithm. 

2.1 A Preliminary Algorithm 

For a point p G V, let xj denote the j-th nearest neighbor of p in F (j > 1), 
and X* = {xi,X2, ■ ■ ■ ,Xh} be the set of nearest neighbors of p in F such that 
the convex hull CH{X*) contains p but the convex hull CH{X* — {xh}) does 
not contain p. To illustrate an 0{n?) time algorithm, it is sufficient to show 
how to compute the set X* for an arbitrary point p G F in 0{n) time. Clearly, 
X* can be obtained in 0{n) time once point Xh is identified. Our key idea here 
is to exponentially search for the point Xh (in 0{n) time) using the selection 
algorithm [6]. 

For each j = 1, 2, . . . , n — 1, let Xp{j) denote the set of the first j nearest 
neighbors of p in F, i.e., Xp{j) = {x\,X2, ■ ■ ■ ,Xj}. The following “decision” 
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Fig. 2. (a) The projection of points on the unit circle Cp (the unfilled little dots are 
the projected points), (b) If CH{Xp{j)) (the solid convex polygon) does not contain 
p, then the wedge Wp{Xp{j)) is less than tt. (c) If CH{Xp{j)) does contain p, then 
Wp{Xp{j)) is no smaller than tt. 



question plays the role of the “probing” operation in the exponential search for 
p: Given a set Xp{j), whether the convex hull CH{Xp{j)) contains p? 

Instead of computing CH{Xp{j)) (in 2-D) explicitly for the decision question, 
we show below that it is sufficient to compute the smallest wedge Wp{Xp{j)) 
whose apex is p and that contains all points of Xp{j). Note that such a wedge 
can be represented by two points on the boundary of the unit circle Cp centered 
at p. In fact, we can project all points of Xp{j) onto the boundary of Cp and 
view the wedge Wp{Xp{j)) as the T-D convex hull’ of the projected points of 
Xp{j) on the boundary of Cp (see Figure 2(a)). 

Lemma 2. For any set Xp{j), p is contained in CH{Xp{j)) if and only if 
Wp{Xp{j)) is not smaller than tt. 

Proof. If p is not contained in CH{Xp(j)), then we can compute two common 
tangents between p and C H (Xp(j)) . These two common tangents determine the 
wedge Wp{Xp{j)) which is less than tt (see Figure 2(b)). 

If p is contained in C H (Xp{j)) , then there are three vertices of CH{Xp{j)) 
which form a triangle containing p (or in the degenerate case, there are two 
vertices of CH{Xp{j)) that define a line segment containing p). Clearly, the 
triangle defined by the projected points of these three vertices of CH{Xp{j)) 
on the boundary of Cp also contains p (see Figure 2(c)). The smallest wedge 
containing these three projected points is > tt, implying that Wp{Xp{j)) is also 

> 7T. □ 

In the following we describe how to decide whether Wp{Xp{j)) > tt in 
0{\Xp{j)\) (= 0{j)) time, i.e., 0(1) time for each point in Xp{j). 

We compute Wp{Xp{j)) iteratively, by adding points to the wedge one by 
one. We first take two arbitrary points from Xp{j); if these two points define a 
line segment that contains p (i.e., the degenerate case), then we are done (i.e., 
the answer to the decision question on Xp(j) is “yes”). Let Wp be the wedge 
computed so far, and consider the next point q from Xp{j). If q is contained 
in Wp, then Wp is unchanged. If q is not contained in Wp, then there are two 
cases (see Figure 3): (1) The antipodal point A{q) of q (i.e., the point on the 
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(a) 



(b) 



Fig. 3. Illustrating the iterative computation of Wp{Xp[j))-. (a) When a point q is not 
contained in the maintained wedge Wp but its antipodal point A[q) is contained in Wp', 
(b) when both q and its antipodal point A{q) are not contained in IVp. 

boundary of Cp that is on the ray starting at the center point p of Cp and along 
the opposite direction of g) is contained in Wp, and (2) the antipodal point A{q) 
of q is not contained in Wp. In Case (1), there are three projected points of Xp{j) 
that form a triangle containing p (see Figure 3(a)), and thus we know by the 
proof of Lemma 2 that Wp{Xp{j)) is > tt. (Note that there is no need to check 
the actual degree of Wp when q is considered, since Wp is > tt if and only if Case 
(1) holds for q.) In Case (2), we update Wp by adding to Wp the wedge from 
an endpoint of Wp to q that does not contain the antipodal point A{q) of q (see 
Figure 3(b)). It is easy to see that processing each point q in Xp{j) takes 0(1) 
time. If the iterative process runs through all points of Xp(j) (in 0(j) time), 
then we know that Wp{Xp{j)) is < tt, and hence the decision question has a 
“no” answer. 

We are now ready to show the exponential search process. Without loss of 
generality, assume n = 2® for some integer g and all distances involved are 
distinct. 

1. Compute the 2*-th nearest neighbor of p, for every i = 0,1, . . . , g — 1. 

Note that the 2®“^-th nearest neighbor of p is the median among all n — 1 
nearest neighbors of p. All these logn(= g) nearest neighbors of p can be 
computed in altogether 0{n) time by repeatedly using the selection algo- 
rithm [6]: First compute the median among all n — 1 nearest neighbors 
of p, remove all the points of V whose distances to p are longer than the dis- 
tance between p and Xm, and recursively select the median in the remaining 
set of points, until the remaining set is empty. The above median selection 
process identifies all 2*-th nearest neighbors of p, for all i = 0, 1, . . . , g — 1. 
These g nearest neighbors of p serve as the possible “probing landmarks” of 
our exponential search procedure. 

2. Identify the smallest value i such that the convex hull CH{Xp{2^)) of the 
set Xp{2^) contains p. 

This is done by asking the decision question on each set Xp(2^), j = 
1,2, ...,i, in altogether 0(n) time, until a “yes” answer is obtained. Once 
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i is found, we know that value h in the nearest neighbor set X* = 
{x\,X 2 , ■ ■ ■ ,Xh\ of p satisfies 2*“^ < h < 2^, i.e., h is in the range of 
+ 1,2*“^ + 2,..., 2*}. If ‘yes’ does not appear, h must lie between 
2®“^ and 2®. We call X 2 i-i+i the left endpoint and X 2 i the right endpoint of 
the search range. Note that the wedge Wp{Xp{2'-~^ + 1)) has been computed 
and is available. 

3. Perform a binary search on the range {2*“^ + 1, 2*“^ + 2, . . . , 2®} to find the 
value h, using the median algorithm [6]. 

We first compute the median Xm among the nearest neighbors 
a^ 2 «-i+i) a; 2 i-i+ 2 ) • • • > a^ 2 » of p, and ask the decision question on Xp{m). Note 
that since the search range {2®“^ + 1, 2®“^ + 2, . . . , 2®} has 2®“^ points and 
the wedge Wp(Xp(m)) can be obtained from Wp{Xp{2^~^ + 1)) for the left 
endpoint of the search range, computing Xm and answering the decision 
question on Xp(m) take totally 0(2®“^) time. The binary search process 
continues iteratively, with each iteration reducing the search range by half 
and maintaining the wedge for the nearest neighbor set of p defined by the 
left endpoint of the search range. Therefore, the value of h is obtained in 
altogether 0(2®“^) = 0(n) time. 

Theorem 1. Given a set V of n points in 2-D, the NNE-graph can he computed 
in 0{n^) time, which is worst-case optimal. 

2.2 The Improved Algorithm 

We show in this subsection how to modify the preliminary algorithm in Sub- 
section 2.1 to obtain an 0(nlogn -b dn) time solution, where d is at least the 
(logn)*^ largest vertex degree or higher in the output NNE-graph. Note that 
when the degrees of most vertices of the output NNE-graph (say, n — c ■ log n 
vertices for some positive constant c) are upper-bounded by O(logn), our algo- 
rithm takes no more than O(nlogn) time, and it is optimal. 

The improved algorithm has two key differences from the preliminary algo- 
rithm: 

— It uses Callahan and Kosaraju’s k nearest neighbors algorithm [2], instead 
of the selection algorithm [6], in the exponential search process. 

— It searches for the C(log n) nearest neighbors of the NNE-graph by simulta- 
neously processing all points of V in each iteration, instead of the computa- 
tions for the points in V one after the other sequentially. 

Recall that for an integer A: > 1, the k nearest neighbors of a point p in V 
are the 1-st, 2-nd, . . . , fc-th nearest neighbors of p in V . The following results 
are the base of our algorithm. 

Lemma 3. [2] For a set P of n points in l-D for any fixed integer I > 1, (1) a 
structure called well-separated pair decomposition of P can be constructed 
in O(nlogn) time, and (2) given a well-separated pair decomposition of P, the 
k nearest neighbors of all points in P can he obtained in 0{kn) time for any 
positive integer k. 
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Our improved algorithm works as follows. 

1. Construct a well-separated pair decomposition of the 2-D point set V (* 
which can be done in O(nlogn) time as stated in part (1) of Lemma 3.*); 
Let 5=1 

2. Find all the g ■ (logn) nearest neighbors of every point in V. (* which can 
be done in O(gnlogn) time as stated in part (2) of Lemma 3. *) 

3. For every point p G V, ask the ‘decision question’ on the nearest neighbor set 
^pig ■ (log’^)) (* shown in Subsection 2.1, which take O(logn) time per 
point in V .*) If the answer is “yes”, then mark the point p with ‘the search 
range for the nearest neighbor set X* is known’; otherwise, ‘the search range 
for X* is not yet known’. 

4. If there are only 0(log n) points left in V such that each of them has marked 
with ‘search range is not yet known’, then terminate the iterative process; 
otherwise (more than O(logn) such points left in V), double the number of 
nearest neighbor search by letting g ^ 2g and go to Step 2. 

After the termination of this iteration, the n points in V can be partitioned 
into two types: (1) Those points p whose search range for X* is already 
known (there are n — O(logn) of them), and ( 2 ) those points p whose search 
range for X* is still not yet known (there are only O(logn) of them). 

5. For every point p of type (1) (n — O(logn) points), we perform a binary 
search to determine X*, which takes 0{g • logn) time. For every point p 
of type ( 2 ) (O(logn) points), we use the exponential search algorithm to 
determine X*, which takes O(logn) time. Thus, we find the NNE-graph for 
C. 

The correctness of this algorithm follows from the same observations in Sub- 
section 2.1. The time analysis of this algorithm is as follows. The initialization 
in Step 1 takes O(nlogn) time (for constructing a well-separated pair decom- 
position of the point set V). The iterative process in Steps 2, 3 and 4 takes dn 
time, where we let d = 5 • (logn). To see this, note that the number of nearest 
neighbors to be searched for each point is doubled after Step 4 for next iteration 
and there are g iterations for 1 < 5 < In Step 5, the determination of X* 
for each point p of type ( 1 ) takes 0 ( 5 -log n) time, and there are n— O(logn) type 
(1) points in V; the determination of X* for each point p of type (2) takes 0{n) 
time, and there are O(logn) type (2) points in V. Therefore, the determination 
of X* for all n points in V takes altogether 0{nlogn + g ■ (nlogn)) time. In 
total, the running time of the algorithm is 0(nlogn -I- dn) for d = 5 • logn. 

Theorem 2. Given a set V of n points in 2-D, the NNE-graph can he computed 
in 0{n\ogn-\-dn) time, where d is the l7(logn)*^ largest degree of the NNE-graph. 

3 The 3-D Case 

It is possible to extend our 2-D approach to solving the problem on point sets 
in l-D for any fixed integer / > 3. In this section, we show this extension to 3-D. 
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In particular, we present an 0{nlogn + dnlogd*) time algorithm for computing 
the NNE-graph on an input set V oin points in 3-D, where d is the 
largest degree and d* is the largest degree in the NNE-graph. The ideas of our 
3-D algorithm are very similar to those for the 2-D case (i.e., exponential search 
and maintaining the convex hull of ‘2-D’ points on the surface of a unit sphere). 
We first illustrate the main ideas and key operations of our 3-D approach by 
presenting a preliminary 0{n^ + mlogd*) time algorithm, where m is the edge 
number of the output NNE-graph. We then give the 0{nlogn + dnlogd*) time 
algorithm. 

For a point p of E in 3-D, a unit sphere Sp centered at p is used to project 
the points of E — {p} onto the surface of Sp along the rays emitted at p. A 
pyramid -p{X') for a 3-D point set X' with apex p is the convex hull of p and 
X' . Furthermore, the convex hull of the points of X' projected on the surface of 
the unit sphere Sp is the base of .p(AT'). Let CHs^{X') denote that base. .p(AT') 
is a convex polygonal cone. It is obvious that for any set Xp(j), p is contained in 
CH{Xp(j)) if and only if p is contained in a tetrahedron defined by four vertices 
of CH{Xp{j)). Therefore, we have the following lemma. 

Lemma 4. For any set Xp{j), p is contained in CFI{Xp{j)) in 3-D if and only 
if the following holds: There exists no semi-sphere of Sp that properly contains 
CHs^{Xp{j)), or in other words, there exists a plane P in 3-D that passes 
through the center p of Sp, such that the (2-D) convex hull C H {PC\C H s^{Xp{j))) 
of the intersection of P and CF[sj,{Xp{j)) contains p (clearly, the convex hull 
CH{Pr\ CHsj,{Xp{j))) is on the plane P). 

Proof. If p is contained in CH{Xp{j)), then there exist four vertices v\, V2, V3, 
and Vi of CFl{Xp{j)) that define a tetrahedron T containing p. At least one of 
the four faces of the tetrahedron T, say, the face F defined by Vi, v^, and W3, 
corresponds to a triangle of the convex hull CHs^{Xp{j)) on the surface of Sp 
(by projecting the face F onto the surface of Sp from p). Then any plane P that 
passes through both p and Vi and cuts the face F will determine a triangle on 
P containing p. 

If p is not contained in CF[{Xp{j)), then there exists a plane P' in 3-D that 
separates p from CF[{Xp{j)) (i.e., p and CF[{Xp{j)) are on the opposite sides 
of P'). P' thus defined intersects the surface of Sp at a circle C{P') such that 
C{P') is properly contained in a certain semi-sphere of Sp. That semi-sphere of 
Sp also properly contains CFlsp{Xp{j)), which is the projection of CFl{Xp{j)) 
from p inside the circle C{P') on the surface of Sp. □ 

Note that in the degenerate case, p can be contained in a line segment defined 
by some points of PnCF[sp{Xp{j)), where P is a plane as defined in Lemma 4. 

Again, the decision question on any point p G V and any nearest neighbor 
set Xp(j) of p in E is crucial to our exponential search process in 3-D. Lemma 4 
seemingly implies that some complicated testing is required, it is actually quite 
simple to perform the test. In fact, the decision question on p and Xp(j) can 
be answered in 0(|Ap(j) | log |Ap(j) |) = O(jlogj) time (i.e., O(logj) time per 
point in Xp{j)). 
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Fig. 4. The nearest neighbors of p in 3-D are projected into a unit sphere. 



We first compute the ‘2-D’ convex hull CHsp{Xp{j)) on the surface of the 
unit sphere Sp. We compute CHs^{Xp{j)) iteratively, as follows. We first take 
two arbitrary points from Xp(j); if these two points define a line segment that 
contains p (i.e., the degenerate case), then we are done (i.e., the answer to the 
decision question on Xp(j) is “yes”). Let CHs^{X') be the convex hull on the 
surface of Sp computed so far, with X' C Xp(j), and consider the next point q 
from Xp{j). Here, unless otherwise specified, we will not distinguish the point q 
from its projected point on the surface of Sp. If q is contained in CHsp(X'), then 
CHsp(X') is unchanged, i.e., CHsp{X' D {q}) = CHsp(X'). Note that checking 
whether q is in CHs^{X') can be done easily in 0(log \CHs^{X')\) time. If q is 
not contained in CHs^{X'), then there are two cases: (I) The antipodal point 
A{q) of q (i.e., the point on the surface of Sp that is on the ray starting at 
the center point p of Sp and along the opposite direction of q) is contained in 
CHsp(X'), and (2) the antipodal point A{q) of q is not contained in CHsj,{X'). 
In Case (I), there are four projected points of Xp(j) (one of which is q) that form 
a tetrahedron containing p, and thus we know by Lemma 4 that no semi-sphere 
of Sp can properly contain CHs^{Xp{j)). (Note that checking whether Lemma 
4 holds in this situation is simply done by deciding whether A{q) G CHs^(X'), 
which can be easily carried out in 0(log \CHs^{X')\) time.) In Case (2), common 
tangents between CHs^{X') and q on the surface of Sp, and let X' = X' VJ {g} 
(note that the resulted CHs^{X') should not contain the antipodal point A{q) 
of q). Computing these two common tangents takes 0{log\CHsp{X')\) time. 
Therefore, processing each point q in Xp{j) takes 0(log |Xp(j)|) = 0(log j) time. 
If the iterative process runs through all points of Xp{j) (in 0{j log j) time), then 
we know that no semi-sphere of Sp can properly contain CHsp{Xp(j)), and hence 
the decision question has a “no” answer. 
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The preliminary algorithm for the 3-D case performs an exponential search 
separately for each point p of V, to compute the nearest neighbor set X* using 
the selection algorithm. Due to the nature of the exponential search and the 
0{j log j) time bound of the decision question on each point p and its nearest 
neighbor set Xp{j) of p, it takes 0{n + \X* \ log |X*|) time to compute X* for 
each point p in V; note that |X*| is not larger than the degree of p in the output 
NNE-graph and X* < 2™ for the edge number m. Therefore, we have the 
follow ing result. 

Theorem 3. Given a set V of n points in 3-D, the NNE-graph can he computed 
in 0{n^ -\-m log d*) time, where m is the edge number and d* is the largest degree 
of the output NNE-graph. 



Our improved algorithm for the 3-D case is also similar to the corresponding 
one for the 2-D case. Our main tool is the 3-D version of Lemma 3: Step 1 
establishes a well-separated pair decomposition of point set V in 0{n log n) time. 
We also set the number of nearest neighbor searched in Step 2 to be g • , 

o r- c? log log n 

for g = I initially. 

In the iterative process, Steps 2, 3, and 4 are the same as in 2-D case. The 
searching of g ■ iq" „ nearest neighbor (Xf) for each point p in V takes g ■ 



log n 
log log n 



log 



log n 
log log n 



time for those type whose search range is already known. 



In total 0{ng ■ . • log . ) time for all n — 0{ , ) such points in 

^ log log n ® log log n ^ ^ log log n ^ ^ 

V . Searching the nearest nei^bor set X* for each point p in V whose search 



range is not yet known takes 0{n -b d ■ logd*) time, and in total 0{n ■ 

~*~ iog°iogn ■ ^ ■ l^&d,*) time for all Q( iogiog„ ) such points in V, where d* is the 
largest degree of the output NNE-graph. Therefore, in summary, we have the 
following result. 



Theorem 4. Given a set V of n points in 3-D, the NNE-graph can he computed 
in O {nlog n -\- nd log d*) time, where d is the largest degree of the 

NNE-graph and d* is the largest degree of the NNE-graph. 



4 Concluding Remarks 

In this paper, we present a worst-case 0{n^) optimal algorithm for finding the 
NNE-graph of a set of n points in 2-D. We also present algorithms for finding the 
NNE-graphs of a point set in both 2-D and 3-D. These algorithms are optimal 
when the (log n)*^ largest vertex degree of the NNE-graph is bounded by 0(log n) 
in 2-D case and when the ( iog°|l)g„ )*^ largest vertex degree of the NNE-graph is 

bounded by in 3-D case. 

Thus far, we have ignored points on the boundary of point set V . Points near 
the boundary cannot be embraced by the convex hull of their neighbors. So, 
which points near the boundary should be considered as in NNE-graph? For sta- 
tistical applications, it is important to study the unbiased statistical properties 
of the NNE-graph of a set of points within a bounded region. Working within 
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a bounded region, our procedure for finding the NNE-neighbors of p, where p is 
distanced d from the boundary of the given bounded region, can stop when the 
nearest neighbor being considered is distanced greater than d from p. Another 
way to deal with the boundary point b is to assign b by degree j and to connect 
b to j nearest neighbors of 6, where j is the average degree of the NNE-graph 
of the internal point set V. This will be consistent with the connections among 
the internal points in V. 

Our improved algorithms are sensitive to the structure of the NNE-graph. 
For example, when d = g ■ log n, the size of the NNE-graph is bounded above by 
0{gnlogn) for I < g < However, it cannot bound the size of NNE-graph 
from below. It is interesting if one can find an 0(n log n -I- m) output sensitive 
algorithm, where m is the number of edges in the NNE-graph. 
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Abstract. Given a set U of n vertices and a set f of m edge pairs, we 
define a graph family Q{V,£) as the set of graphs that have vertex set 
V and contain exactly one edge from every pair in £. We want to find a 
graph in G{V, £) that has the minimal number of connected components. 
We show that, if the edge pairs in £ are non-disjoint, the problem is NP- 
hard even if the union of the graphs in G{V, f) is planar. If the edge pairs 
are disjoint, we provide an 0{n^m)-time algorithm that finds a graph in 
G{V,£) with the minimal number of connected components. 



1 Introduction 

Description of the problem. Given a set U of n vertices and a set £ of m edge 
pairs, we define a graph family G{V, £) as the set of graphs that have vertex set 
V and contain exactly one edge from every pair in £. The maximal connectivity 
problem (MCP) is the problem of finding a graph G* in Q(V,£) that has the 
minimal number of connected components. We call such a graph G* maximally 
connected or maximal. Edelsbrunner [4] proposed MCP as a graph-theoretic 
formulation of a problem arising in the repair of self-intersections of triangulated 
surfaces [5]. We show that, if the edge pairs in S are non-disjoint, MCP is NP- 
hard. If the edge pairs are disjoint, we provide a polynomial-time solution for 
MCP. We obtain a maximal graph in G{V,£) by starting with an arbitrary 
graph inG{V,£) and making only local changes, so-called edge flips, that do not 
increase the number of connected components in the graph. We study also the 
question whether any graph in G{V,£) can be transformed into any other graph 
in G{V, £) using edge flips while guaranteeing an upper bound on the number of 
connected components in every intermediate graph. 

Motivation and related work. Edge flips have received considerable attention, 
particularly in the context of geometric graphs such as triangulations and 
pseudo-triangulations of planar point sets. The length of sequences of diago- 
nal flips in triangulations is studied in [7,8,11]. Negami [14] studies diagonal 
flips in triangulated planar graphs. Diagonal flips in pseudo-triangulations are 
studied in [1,3]. The expected length of flip sequences in the randomized incre- 
mental construction of Delaunay triangulations in two and higher dimensions is 
studied in [6,10,12,13,15]. Aichholzer et al. [2] study different transformations 
of non-intersecting spanning trees of point sets in the plane and the number 
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of such transformations required to obtain a minimum spanning tree from any 
non-intersecting spanning tree of a point set. 

The reason for this interest in flip sequences is that they are combinatori- 
ally interesting and potentially lead to efficient algorithms for solving certain 
optimization problems on graphs. In general, one considers a family Q of graphs 
that have the same vertex set and usually the same number of edges. An edge 
flip removes an edge e from a graph G in ^ and replaces it with another edge 
e' so that the resulting graph is also in Q. Other types of flips that introduce 
or remove edges have been studied, for instance, in [1,3]. Often, the structure 
of the graphs in Q guarantees that every flip happens in a small subgraph of 
G; that is, flips are local transformations. For example, the flip of an edge e in 
a triangulation of a planar point set replaces edge e with the other diagonal of 
the quadrilateral obtained by removing e. If we have a certain quality measure 
of the graphs in Q such as Delaunayhood (for triangulations) or the number of 
faces (for pseudo-triangulations), it is interesting to ask whether a globally opti- 
mal graph in Q can be obtained by making only local changes that improve the 
quality of the graph. If the answer is affirmative and the number of required flips 
is small, efficient algorithms result because local transformations of the graphs 
can often be implemented efficiently. 

Terminology and notation. We denote the number of connected components 
of a graph G by w(G). We define oj{Q{V,£)) = min{a;(G) : G € Q(V,£)}; in 
particular, a graph G* G Q{V,£) is maximal if uj{G*) = uj{Q{V,£)). We say 
that a family Q{V,£) is k-thick if every edge appears in at most k pairs in £; 
in particular, Q(V,£) is I-thick if the edge pairs in £ are pairwise disjoint. We 
define k-MCP to be MCP restricted to fc-thick families; planar MCP is MCP 
restricted to families Q{V,£) such that the graph (P, Upe£ planar. 

For a 1-thick family Q{V,£), the flip of an edge e in a graph G G G(V,£) 
removes edge e from G and replaces it with the other edge e in the edge pair 
P G £ that contains e. We call e the complementary edge or complement of 
e and denote the graph {V, {E{G) \ e) U {e}) obtained by Hipping edge e in 
G by G(e). More generally, we denote the graph obtained from G by flipping 
edges Cl, . . . , e,j by G(ei, . . . ,Cq). We call the flip of an edge e splitting, stable, 
or merging depending on whether w(G(e)) is greater than, equal to, or less than 
uj{G). A flip sequence ei, . . . ,Cq is merging if every flip in the sequence is stable 
or merging and w(G(ei, . . . , Cq)) < w(G). A merging flip sequence ei, . . . ,Cq is 
maximizing if G(ci, . . . , e,) is maximal. 

In Section 5, we consider Q{V,£) to be itself a graph whose vertices are the 
graphs inG{V,£) and such that there is an edge between two graphs Gi and G 2 
if G 2 = Gi(e), for some edge e G Gi. We use G{V,£,k) to denote the subgraph 
of G{V, £) induced by all vertices G G G{V, £) such that the graph G has at most 
k connected components. 

Our results. In Section 2, we prove that planar /c-MCP is NP-hard, for any 
fc > 1. In Section 3, we show that every graph in a 1-thick family G{V,£) has a 
maximizing sequence of at most n — 1 flips. An algorithm to find such a sequence 
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in 0{’n?m) time is presented in Section 4. The central part of the algorithm is an 
0(nm)-time algorithm that finds a merging sequence of at most n — 1 flips for a 
given non-maximal graph. In Section 5, we study the connectivity of Q{V,£, k), 
for any 1-thick family Q(V,£). We show that the graph Q{V,£,k) is connected 
and has diameter at most n -I- m — 1, for any k > Cij{Q(V,£)); that is, for any 
two graphs Gi, G2 G Q{V,£, k), there exists a sequence ei, 62, ■ ■ ■ , e, of at most 
n -I- m — 1 edges such that Gi(ei, 62, . . . , tq) = G2 and w(Gi(ei, 62, , ej)) < k, 
for all 1 < i < q. Q{V,£,Co{Q{V,£))) is not necessarily connected. 

2 NP-Hardness of Planar fc-MCP 

Our proof that planar fc-MCP is NP-hard for fc > 1 uses a linear-time reduction 
from 3-SAT to planar 2-MCP. First we recall the necessary terminology. Given 
a Boolean variable x, we denote its negation by x. A literal is a Boolean variable 
or its negation. A clause is the disjunction of literals: G = Ai V A2 V . . . V Afc. 
A Boolean formula F is in conjunctive normal form (CNF) if it is of the form 
F = Gi A G2 A . . . A Cm, where Gi, . . . , Cm are clauses. Formula F is in 3-CNF 
if every clause Gi, 1 < i < m, contains exactly three literals. In this case, we 
denote the literals in Gj by Ai_i, Xi^ 2 , and Ai_s. We denote the Boolean variables 
in F by x\, . . . ,Xn- It is well-known that the problem of deciding whether a 
given formula in 3-CNF is satisflable, 3-SAT, is NP-complete [9]. Hence, if we 
can provide a polynomial-time reduction from 3-SAT to MCP, MCP is NP-hard. 

An important element used in a number of constructions in this paper is the 
“connector graph” shown in Figure 1. This graph is planar. Its edges are grouped 
into disjoint edge pairs as indicated by the numbering in Figure I. It is easy to 
verify that, no matter which edge we choose from each edge pair, the resulting 
subgraph is connected. Hence, we can distinguish two of the vertices, a and 6, 
and think of the graph as a “permanent” edge between vertices a and b; that is, 
this edge has to be present in every graph in the graph family. We will represent 
such permanent edges as squiggly edges in subsequent figures. 

Given a formula F in 3-CNF with n variables xi,...,Xn and m clauses 
Gi,...,Gm, we construct a graph G'{F) and assign its edges to appropriate 
edge pairs to obtain a family G{F) of subgraphs of C'{F). We ensure that G'{F) 
is planar, that no edge of G'{F) is in more than two pairs, and that G{F) 
contains a connected graph if and only if F is satisflable. For every literal A, 
let k(A) be the number of clauses containing A. For a variable Xi, we define 




Fig. 1. The connector graph. Two edges with the same number form an edge pair. 
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0 ' Pi P2 Pi Pi 71,1 71,2 71,3 72,1 72,2 72,3 73,1 73,2 73,3 74,1 74,2 74,3 




Fig. 2. The graph G'{F) for the formula F = {xi V *2 V X3) A (a;i V a;2 V Xi) A [xi V 
X3 V Xi) A (a;2 V 0:3 V X4). Regular edges are labelled with their names. The small italic 
labels identify the edge pairs that contain each edge. 



K*(xi) = \K{xi) — K{xi)\. Let K* = X)r=i The vertex set of G'{F) contains 

vertices Fi , . . . , Fm, one per clause; vertices 'jij, 1 < i < m and 1 < j < 3, one 
per literal Xij] and vertices «i, . . . , /3i, . . . ,/?„•, o', (3'. The vertices of G'{F), 

excluding vertices Fi, . . . , Fm, are connected using permanent edges to form a 
chain as shown in Figure 2. Besides these permanent edges, graph G'{F) con- 
tains regular edges = {Fijjij), 1 <i < m and 1 < j < 3, and fk = {cxk,/3k), 
1 < k < K*. We refer to an edge eij as a literal edge and to an edge fk as a 
dummy edge. Graph G'{F) is obviously planar. 

Next we group literal and dummy edges into pairs so that, in every graph in 
G{F), a literal edge etj is present if and only if every edge Ci/j' with Xij = Xi'^f 
is present and every edge Cj" with Xij = Xi"jn is absent. Hence, the presence 
and absence of edges in a graph in G{F) corresponds to a truth assignment to 
the variables x\, . . . , x„. Consider a variable Xk and the literals Aij jy , . . . , Xi^j^ 
such that Xi^j^ = Xk or Xi^j^ = Xk, for all 1 < h < q. We assume w.l.o.g. that 
= ■ • • = Xi^j^ = Xk and = . . . = Xi^j^ = Xk, for some 0 < r < q. 

We also assume that n*{xk) = n{xk) — K{xk), that is, there are at least as many 
positive literals Xk in F as negative literals Xk - Then we choose a set of s = n*{xk) 
dummy edges , . . . , that have not been included in any pairs yet. We define 
the following edge pairs, where t = q — r: for 1 < h < t; 

{ei^+r,jh+r^ei^+i,jH+i}^ for 1 < < min(t,r - 1); fij, for 1 < < s; 

and {//fc , }, for 1 < ft, < s. Intuitively, we construct a “path” of 

edge pairs where edges corresponding to positive and negative literals alternate 
until we run out of negative literals; once this happens, we place a dummy edge 
between every pair of consecutive positive literals. This ensures that either all 
edges corresponding to literal Xk are present and all edges corresponding to 
literal Xk are absent or vice versa. Note that, by creating k* dummy edges, we 
ensure that we have enough dummy edges to complete this construction for all 
variables Xi, . . . ,Xn while using every dummy edge in the creation of edge pairs 
for exactly one variable Xk- This guarantees that, indeed, every edge is in at 
most two edge pairs, as illustrated in Figure 2. 

Now we observe that the vertices in the chain formed by the permanent edges 
in G'{F) belong to the same connected component ift in any graph in G{F). In a 
graph G € Q{F), vertex Fi is connected to a vertex in FI if and only if the truth 
assignment corresponding to G satisfies clause Gi. Hence, there is a connected 
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graph in G{F) if and only if F is satisfiable. Since the construction of G{F) from 
F can easily be carried out in linear time, we obtain the following result. 

Theorem 1. Planar k-MCP is NP-hard, for any k >2. 

3 Existence of Short Maximizing Flip Seqnences 

Given the NP-hardness of fc-MCP for fc > 1, we restrict our attention to 1- 
MCP in the remainder of this paper. In order to solve this problem, we start 
with an arbitrary graph G in the given 1-thick family G(V,£) and then compute 
a maximizing sequence of flips. In this section, we prove that every graph in 
G{V,£) has a maximizing sequence of at most n — 1 flips. In the next section, we 
develop an 0(n^TO)-time algorithm that finds such a sequence. First we prove 
that every non-maximal graph G in G(V,£) has a merging flip or a stable flip 
that leaves the connected components of G invariant, that is, does not change 
their vertex sets. We call such a stable flip strongly stable. 

Lemma 1. Every non-maximal graph G in a family G{V,E) has a merging or 
strongly stable flip. 

Proof. Every non-maximal graph contains a cycle. Flipping any edge in the cycle 
cannot split any connected component. □ 

We call the flip of an edge e in a graph G greedy if the endpoints of edge e 
are in different connected components of G. A sequence Ci, . . . , of edge flips 
is greedy if, for every I <i < q, the flip of edge ti is greedy for G(ei, . . . , e^-i). 
The following observation establishes two important properties of greedy flips. 

Observation 1. The flip of an edge e € G is merging if and only if it is greedy 
and e is not a cut edge of G. A greedy flip such that e is a cut edge is stable. 

Another way to interpret the second statement in Observation 1 is that stable 
greedy flips leave all cycles in G intact. Also observe that, for every greedy flip 
e, edge e is a cut edge of G{e). Ideally, one would hope that every non-maximal 
graph has a merging flip because then a maximal graph can be found efficiently. 
However, it is easy to show that this is not the case (see [16]). Next we show 
that every non-maximal graph has a short maximizing flip sequence. 

Lemma 2. Every non-maximal graph has a maximizing sequence of at most 
n — 1 flips. 

Proof. Let G be a non-maximal graph in a 1-thick family G{V,£), let G* be a 
maximal graph in G(V, £), and let T* be a spanning forest of G* that contains a 
spanning tree for every connected component of G*. We prove that there exists 
a maximizing sequence for G that flips at most all the edges in G whose com- 
plements are in T*. Since there are at most n — 1 such edges, this proves the 
lemma. The proof is by induction on the number of edges in G whose comple- 
ments are in T* . If there is exactly one such edge e, the graph G(e) has T* as a 




166 



N. Zeh 



subgraph and, hence, is maximal. Thus, the single flip of edge e is maximizing. 
So assume that G contains r > 1 edges whose complements are in T* and that 
the lemma holds for every non-maximal graph G' that contains less than r edges 
whose complements are in T*. Since T* has fewer connected components than 
G, there has to be an edge e in T* that connects two vertices in different con- 
nected components of G. The flip of its complement e is greedy for G; that is, 
this flip is either merging or stable for G. If G(e) is maximal, then the sequence 
that flips only edge e is maximizing. Otherwise, G(e) is a non-maximal graph 
that contains r — 1 edges whose complements are in T* . Hence, by the induction 
hypothesis, there exists a maximizing sequence ei, . . . , et for G(e) with t < r—1. 
The sequence e, ei, . . . , et is maximizing for G and has length at most r. □ 



4 Finding Short Merging Flip Sequences 

Given that every graph G has a maximizing sequence of at most n — 1 flips, 
we would like to compute such a sequence efficiently. Given G and a maximal 
graph G*, the construction in the proof of Lemma 2 can easily be carried out in 
0{nm) time. The problem is finding G*. In this section, we provide an 0(nm)- 
time algorithm that finds a merging sequence of at most n — 1 flips for any 
non-maximal graph. By applying this procedure at most n — 2 times, which 
takes 0{n^rn) time, we obtain a maximal graph G*. 

Given a graph G S G{V,£), we use an auxiliary directed graph H, which 
is derived from G, to find a merging flip sequence for G. We prove that every 
shortest path between certain vertices in H has length at most n — 1 and cor- 
responds to a merging flip sequence and that at least one such path exists if G 
is non-maximal. Hence, all we have to do is apply breadth-first search to H to 
either find such a shortest path and report the corresponding flip sequence or 
output that G is maximal if no such path exists. 

The vertex set of H is the edge set of G. For two edges e and / of G, there 
is an edge (e, /) (directed from e to /) in iL if / is a cut edge of G, but not of 
G U {e}. Every edge in G that is not a cut edge corresponds to a source (vertex 
of in-degree 0) in iL; we call such a source a root. Every edge e in G such that e 
has its endpoints in different connected components of G corresponds to a sink 
(vertex of out-degree 0) in H; we call such a sink a leaf. Every merging sequence 
of flips has to flip an edge corresponding to a root, because it has to break at 
least one cycle in G. It also has to flip at least one edge corresponding to a leaf 
in H because, otherwise, the flips in the sequence cannot reduce the number of 
connected components. In particular, if we stop the construction in the proof 
of Lemma 2 as soon as the number of connected components has reduced by 
one, we obtain a merging sequence of length at most n — 1 that starts with 
a greedy flip, which corresponds to a leaf in H , and ends with a merging flip, 
which, by Observation 1, corresponds to a root. Our goal is to show that graph H 
contains a root-to-leaf path if G is not maximal and that every shortest such path 
corresponds to a merging sequence of flips. So assume that G is not maximal. 
We call a greedy flip sequence ei, . . . , Cg monotone if edges ei, . . . , Cg are in G. 
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As shown in the proof of Lemma 2, a monotone merging sequence of length at 
most n — 1 exists if G is not maximal. The following two lemmas are the first 
steps toward showing that there exists a root-to-leaf path in H. 

Lemma 3. In a shortest monotone merging sequence ei,...,6g for G, edges 
ei, . . . , Cq-i are cut edges of G. 

Proof. Assume the contrary and choose j minimal so that ej is not a cut edge; 
that is, Cl, , 6j-i are cut edges. We claim that ei, . . . ,Cj is a monotone merg- 
ing sequence of flips. This would contradict the assumption that ei,...,Cq is 
a shortest such sequence for G. Sequence is monotone because ev- 

ery subsequence ei,...,eh of a monotone flip sequence is monotone. To see 
that sequence is merging, we make the following observations: (1) 

oj{G{ei , . . . , ej-i}) < uj{G), because the sequence ei, . . . , is merging. (2) Edge 
Cj has its endpoints in different connected components of G(ei, . . . ,ej_i), be- 
cause the sequence Ci, . . . , is greedy. (3) The cycles of G are invariant under 
deletion of cut edges. Hence, ej is not a cut edge of G(ei, . . . , ej_i) and, by 
Observation 1, w(G(ei, . . . , ej)) < uj{G{ei , . . . , ej_i)) < w(G). □ 

Lemma 4. In a shortest monotone merging sequence ei, . . . , Cg for G, Cq is not 
a cut edge ofG. 

Proof. Since sequence ei,...,Cg is a shortest monotone merging sequence, no 
subsequence ei, . . . , e^, j < q, is merging. Hence, the flip of edge Cg is merging 
for G(ei, . . . , Cg-i). By Observation 1, this implies that Cg is not a cut edge 
of G(ei, . . . ,eg_i). If Cg is a cut edge of G, we choose j minimal so that Cg is 
not a cut edge of G(ci, . . . , ej). Since Cg is a cut edge of G(ei, . . . , ej-i), the 
insertion of edge ej must create a cycle in G(ei , . . . ,ej). But then the endpoints 
of Cj are in the same connected component of G(ei, . . . , ej-i), contradicting the 
greediness of sequence ei, . . . , Cg. □ 

Lemma 4 implies that Cg is a root in H . Hence, to prove that graph H contains 
a root-to-leaf path of length at most n — 1 if G is not maximal, it suffices to 
show that there exists such a path from Cg to a leaf. 

Lemma 5. If G is not maximal, then there exists a root-to-leaf path of length 
at most n — 1 in H. 

Proof. Consider a shortest monotone merging flip sequence Ci, . . . , Cg. Then q < 
n—1. We show that, for every edge Ci, 1 < i < q, there exists a path of length at 
most i from to a leaf in H. Hence, there is a path of length at most n—1 from 
Cg to a leaf; Cg is a root. The proof is by induction on i. Since the endpoints of 
edge Cl are in different connected components of G, by the greediness of sequence 
Cl, . . . , Cg, edge Ci is a leaf of H. So assume that i > 1 and that the claim holds 
for Cl, ... , Ci_i. If 6i is a leaf, the claim holds for e^. Otherwise, the endpoints of 
Cj are in the same connected component of G. But they are in different connected 
components of G(ei, . . . , Ci-i); so there must be an edge Cj, j < i, that is on 
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all paths in G connecting the endpoints of e^, because edges ei, . . . , e^-i are cut 
edges. This implies that ej is an out-neighbour of e* in H . By the induction 
hypothesis, there exists a path of length at most j from ej to a leaf. This implies 
that there exists a path of length at most j -I- 1 < z from to a leaf. □ 

Given that graph H is guaranteed to contain a root-to-leaf path if G is not 
maximal, it is natural to ask what the relationship between root-to-leaf paths in 
H and merging flip sequences for G is. It is easy to show that not every root-to- 
leaf path in H corresponds to a merging flip sequence (see [16]). Next we prove 
that every shortest root-to-leaf path in H corresponds to a merging sequence. 
To prove this fact, we make use of the following two results. 

Lemma 6. For a shortest root-to-leaf path (ei, . . . , Cq) in H and any 1 < i < q, 
edges e-i and Cj+i are in a common simple cycle of the graph G(ei, . . . , Cj). 

Proof. Assume that the lemma does not hold. Then let i be minimal so that edges 
Ci and 6i+i are not in a common simple cycle in G(ei , . . . ,ef). Since edge (cj, Ci+i) 
exists in F[, edges G and e^+i are in a common cycle in G U {ci}. We choose j 
maximal so that G and Cj+i are in a common cycle of G(ei, . . . , Ch) U {ci}, for 
all 0 < ft. < j. Then j < i, and edges G and Cj+i do not belong to a common 
cycle in the graph G(ei, . . . , Cj+i) U {ei}. Hence, edge e^+i is on the cycle in 
G(ei, ..., Cj) U |G} that contains edges e^+i and G (see Figure 3a). Since e^- and 
Cj+i belong to a common cycle in G(ei, . . . ,ej), the removal of edge e^+i still 
leaves a cycle that contains Ci and Cj+i (Figure 3b), unless the cycle containing 
Cj and e^+i also contains e^+i (Figure 3c). In the former case, we obtain a 
contradiction to the assumption that no cycle containing and Cj+i exists in 
G(ei, . . . ,ej_|_i) U {cij. We prove that, in the latter case, there exists a shorter 
path from Ci to Cq in H, which contradicts the assumption that path ei, . . . , Cg 
is a shortest root-to-leaf path in H. 

Since edge {ej,ej+i) exists in H, edge Cj+i is a cut edge on the path in G 
that connects the endpoints of edge Cj. If e^+i is also on this path, then edge 
{cj, 6i+i) exists in H and (ei, . . . , e^, Cj+i, . . . , Cg) is a shorter path from Ci to Cg 




Fig. 3. (a) Edges Ci, Ci+i, and Cj+i belong to a cycle in G(ei, . . . , Cj) U{ei}. (b) If edge 
6i+i is not on the cycle in G(ei, . . . , ej) containing edges Cj+i and Cj, then there exists 
a cycle (bold) in G(ei, . . . , Cj+i) U {ci} that contains edges Ci and Ci+i. (c) The case 
when edge Ci+i is on the cycle in G(ei , . . . ,Cj) containing edges Cj+i and Cj. 
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Fig. 4. The proof that there must be an edge (e^, e^+i) in H, where h < i, if edge e^+i 
is on the cycle in G(ei, . . . , Cj) that contains edges ej+i and ej. 



in H, a contradiction. So assume that e^+i is not on this path. Then edge e^+i 
is a cut edge of G and both endpoints of edge ej are in the same connected com- 
ponent of G— Ci+i. Moreover, none of the edges ei, . . . ,6^ connects two vertices 
in different connected components of G. Hence, the only way to create a path in 
G(ei, . . . , €j-i) that connects the endpoints of edge Bj and contains edge e^+i is 
by adding an edge Bh, h < j, whose endpoints are in different connected com- 
ponents of G — Ci+i, but in the same connected component of G (see Figure 4). 
Then, however, edge {ch, Gi+i) exists in H and the path (ei, . . . , eh, Ci+i, . . . , Cq) 
is a shorter path from Ci to Cq in H , again a contradiction. □ 

Corollary 1. Let e\, . . . ,Cq he a shortest root-to-leaf path in H . Then, for every 
f <i < q, the flip of edge ei is strongly stable for G(e \, . . . , et-i). 

Using Lemma 6 and Corollary 1, we can now prove that every shortest root- 
to-leaf path in H corresponds to a merging flip sequence for G. 

Lemma 7. A shortest root-to-leaf path in H corresponds to a merqinq sequence 
of flips for G. 

Proof. Consider a shortest root-to-leaf path ei,...,eq in H. By Corollary 1, 
all flips in the sequence ei, . . . , e^-i are strongly stable. By Lemma 6, edge e, 
belongs to a cycle in G(ei, . . . ,eq-i). By Corollary 1, edge Bq has its endpoints 
in different connected components of G(ei, . . . , because this is true in G. 

Hence, by Observation 1, the flip of edge e, is merging for G(ei, . . . ,eq-i) and 
the whole sequence ei, . . . , is merging for G. □ 

Given this correspondence between shortest root-to-leaf paths in H and merg- 
ing sequences for G, we can find a merging sequence of flips for G in 0{nm) time: 
First we create the vertex set of graph H by adding a vertex for every edge of 
G. Next we identify the cut edges of G and label all those vertices in H as roots 
whose corresponding edges in G are not cut edges. We contract every 2-edge 
connected component into a single vertex and call the resulting graph G' . We 
compute its connected components, which are trees, and root each such tree at 
an arbitrary vertex. To identify the edge set of H and the leaves of H, we scan 
the set of complementary edges of the edges in G. We discard edges that have 
become loops as the result of the contraction of the 2-edge connected compo- 
nents of G, because they run parallel to paths in 2-edge connected components 
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of G and, hence, neither have any out-neighbours nor are leaves in H. We mark 
a vertex e in i/ as a leaf if the endpoints of edge e are in different connected 
components of G' . For every edge e that is not a loop and that has its endpoints 
in the same connected component of G' and every edge / on the tree path con- 
necting the endpoints of e, we add an edge (e, /) to H. These edges / can be 
identified by traversing paths from the endpoints of e to their LCA in the tree. 
Since there are at most n — 1 cut edges in G, the edge set of G' has size at 
most n — 1; the vertex set has size at most n. There are m edges e. Hence, after 
constructing G', which takes 0{n + m) time, it takes 0{nm) time to construct 
H. Now we run multi-source BFS from all roots of H simultaneously to decide 
whether there exists a root-to-leaf path in H and, if so, find a shortest such path; 
that is, we place all the roots at the first level of the BFS and then grow the 
BFS-forest as usual level by level. This takes 0{nm) time. If G is not maximal, 
our discussion implies that this procedure finds a root-to-leaf path. Since we use 
BFS to find it, it is a shortest such path; in particular, the resulting path has 
length at most n — 1. Hence, we report the sequence of flips corresponding to 
the vertices on the path as a merging flip sequence. 

Theorem 2. It takes 0{nm) time to decide whether a graph G in a 1-thick 
graph family Q{V,£) is maximal and, if not, find a merging sequence of at most 
n — 1 flips for G. 

Since any graph in Q{V,£) has at most n—1 connected components if f 0, 
we can apply the algorithm sketched above at most n — 2 times before we obtain 
a connected graph, which is maximal. Thus, we start with an arbitrary graph G 
in Q{V,£) and repeatedly apply Theorem 2 to compute a maximizing sequence 
of at most {n — 2)(n — 1) flips. We apply these flips in G(n^) time to obtain a 
maximal graph G* G Q(V,£). As pointed out at the beginning of this section, 
we can compute a maximizing sequence of at most n—1 flips for G in 0(nm) 
time, once G* is given. This proves the following result. 

Corollary 2. It takes 0{n^m) time to compute a maximal graph in a 1-thick 
graph family G{V,£) and to compute a maximizing sequence of at most n — 1 
flips for a given graph G in Q{V,£). 



5 Connectivity of Snb-families Under Edge Flips 

Since every graph in a 1-thick family G(V,£) can be transformed into a maximal 
graph, an interesting question to ask is whether for any two graphs G\ and G 2 
in G{V,£) with at most k connected components, there exists a flip sequence 
Cl, . . . , Cg that transforms Gi into G 2 and such that w(Gi(ei, . . . , ej) < k, for 
all 1 < i < ( 7 . We call such a sequence k-stable. Note that some of the flips in a 
fc-stable sequence may be splitting. Another question to ask is how many flips 
such a sequence has to contain. Formally, we ask whether the graph Q(V,£, k) 
is connected and what its diameter is. We prove that Q{V,£,k) is connected. 
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Fig. 5. A family Q{V,£) of graphs such that Q{V,£,ijj{Q{V,£))) is disconnected. 



for every k > u}{G{V,£)), and that G(V,£,oj{G{V,£))) may be disconnected. We 
also show that, for k > Cj{G{V, £)), G{V, £, k) has diameter at most n + m — 1. 
To show this, we prove a fact that is in a sense orthogonal to Lemma 2: While 
Lemma 2 shows that every non-maximal graph can be transformed into some 
maximal graph using a merging sequence of at most n — 1 flips, we prove next 
that every non-maximal graph can be transformed into any maximal graph using 
a k-stable sequence of at most m flips, some of whose flips may be splitting. 

Lemma 8. For any graph family G{V,£) and any k > ui{G{V,£)), G{V,£,k) is 
connected and has diameter at most n + m — 1. 

Proof sketch. The proof of Lemma 2 can easily be adapted to show that any 
non-maximal graph G\ can be transformed into any maximal graph G2 using a 
fc-stable sequence of at most m flips. Indeed, the only difference is that, instead 
of stopping when a maximal graph G' is obtained, we keep flipping edges until 
G2 is obtained. The crucial observation is that, if the current graph G' is not 
maximal, there exists an edge in G' that is not in G2 and whose flip is stable or 
merging. If G' is maximal, we can flip any edge e in G' that is not in G2', this 
will result in a graph G'{e) with w(G'(e)) < w(G') -I- 1 < A:. 

The lemma now follows because, for any two graphs Gi and G2 in G(V,£, k), 
we first find a maximizing sequence ei,...,Cq of of at most n — 1 flips that 
transforms Gi into a maximal graph G3. Such a sequence exists by Lemma 2. 
Then we find a /c-stable sequence e'^ , . . . , of at most m flips that transforms G2 
into G3. The sequence Ci, . . . , e^, e^, . . . , e'3, which has length q + r<n + m— 1 , 
transforms Gi into G2 and is fc-stable. □ 

The statement of Lemma 8 is true for any fc > Cb{G{V, £)). For fc = u}{G{V, £)), 
the example in Figure 5 shows that G{V,£,oj{G{V,£))) is not necessarily con- 
nected: The horizontal edges are permanent; that is, they represent connector 
graphs. Two graphs in the family are Gi, which includes the permanent edges 
and the solid vertical edges, and G2, which includes the permanent edges and 
the dashed vertical edges. Flipping any vertical edge in Gi increases the number 
of connected components by one. Hence, there is no o3(t/(V '^^))-stable sequence 
of flips that transforms Gi into G2, which proves the following lemma. 

Lemma 9. There exists a graph family G{V,£) such that G(V,£,Lb(G(V,£))) is 
disconnected. 
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6 Concluding Remarks 

The algorithm for finding a merging sequence of flips takes 0{nm) time. Re- 
cently, an improved 0{n + m)-time algorithm for this problem has been ob- 
tained [17]. This allows the computation of a maximal graph in a 1-thick family 
in 0{nm) time. The algorithm is a refinement of the ideas presented here. In 
light of these developments, the most interesting open question is whether there 
is a linear-time algorithm that solves 1-MCP. In [17], we conjecture that this is 
the case and provide some intuition why this should be true. 
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Abstract. We improve on the classical Nemhauser- Trotter Theorem, 
which is a key tool for the Minimum (Weighted) Vertex Cover 
problem in the design of both, approximation algorithms and exact fixed- 
parameter algorithms. Namely, we provide in polynomial time for a graph 
G with vertex weights w : V — >■ (0, oo) a partition of V into three subsets 

Vo, Vi, Vi, with no edges between Vo and V'l or within Vq, such that 
2 2 
the size of a minimum vertex cover for the graph induced by Vi is at 

least |w(Vi), and every minimum vertex cover C for (G,w) satisfies 

Vi C G C V^I U Vi . 

2 

We also demonstrate one of possible applications of this strengthening of 
NT-Theorem for fixed parameter tractable problems related to Min- VC: 
for an integer parameter k to find all minimum vertex covers of size at 
most k, or to find a minimum vertex cover of size at most k under some 
additional constraints. 



1 Introduction 

The (weighted) Vertex Cover problem (shortly, Min-iu-VC) is one of the 
fundamental NP-hard problems in the combinatorial optimization. In spite of a 
great deal of efforts, the tight bound on its approximability by a polynomial time 
algorithm is left open. Recall that the problem has a simple 2-approximation al- 
gorithm and currently the best lower bound on polynomial time approximability 
is 10-\/5 — 21 « 1.36067, due to Dinur and Safra [7]. The parametrized version of 
the Vertex Cover problem is a well known fixed parameter tractable (FPT) 
problem and has received considerable interest: for a given graph and a positive 
integer k, the problem is to find a vertex cover of weight at most k or to report 
that no such vertex cover exists. 

A key tool for the approximation algorithms and for the parametrized ver- 
sion of Min-w-V C is the Nemhauser-Trotter Theorem (NT-Theorem) . The NT- 
Theorem efficiently reduces the Min-w-VC problem to instances (G, w), in which 

* Supported by the EU-Project ARACNE, Approximation and Randomized Algo- 
rithms in Communication Networks, HPRN-CT-1999-00112. 
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the size of a minimum vertex cover is at least ^w{V). This result is useful as 
an approximation preserving preprocessing step, that reduces the problem to 
find one optimal vertex cover to more restricted instances. In the parametrized 
Min- VC any kernelization technique reduces in conjunction or independently 
both, the size of the input graph and the parameter size. As observed in [4], 
NT-Theorem allows to find efficiently a linear size problem kernel for Min-VC. 

But the original NT-Theorem and also other known kernelization techniques 
(see [1], [15]) are less efficient if for a graph (G,w) and an integer parameter k 
the goal is to find all minimum vertex covers of weight at most k, or to find a 
minimum vertex cover of weight at most k under some additional constraints. 
In this contribution we overcome this difficulty and prove the theorem much 
stronger than the classical NT-Theorem. We also show how this result can be 
applied to obtain new algorithmic and complexity results for fixed-parameter 
tractable problems related to the Min-ic-VC (unweighted, for simplicity). 

Preliminaries. Let G = (V, E) be a graph with vertex weights w : V — >■ (0, oo). 
For a set of vertices U CV, let T(f7) := {v € V : 3u € U such that {u, w} G E} 
stand for the set of its neighbors, and G[U] denote the subgraph of G induced 
by U. The weight of a vertex subset U C V is defined by w{U) := 

Minimum Weighted Vertex Cover (Min-w-VC) 

Instance: A simple graph G = (V,E) with vertex weights w : V ^ (0,oo). 
Feasible solution: A vertex cover C for G, i.e., a subset G QV such that for each 

e G A, e n G yf 0. 

Objective function: The weight w{G) := vertex cover G. 

The unweighted version the Minimum Vertex Cover problem (shortly, 
Min-VC) is the special case of Min-w-VC with uniform weights w{u) = 1 for 
each u G V. Let VC{G, w) be the set of all minimum vertex covers for (G, w) 
and vc{G,w) stand for the weight of the minimum vertex cover for (G,w). In 
unweighted case we use shortly VC{G) and vc{G). 

Min-w-VC problem can be expressed as an Integer Program (IP) as follows: 
the goal is to minimize the function w{x) := ’ ^(^)> where x{u) G 

{0,1} for each u G V, and a feasible solution a: : V — >■ (0, 1} has to satisfy 
edge constraints x{u) -I- x(v) > 1 for each edge {u,v} G E. There is one-to-one 
correspondence between the set of vertex covers for G and the set of functions 
a; : V — >■ {0, 1} satisfying all edge constraints; each such x is an indicator function 
of some vertex cover for G. 

The Linear Programming (LP) relaxation of the IP-formulation allows x(u) G 
(0,1) (or even x{u) > 0). It is well known ([12], [14]) that there always exists 
an optimal solution of the LP-relaxation with the variables x{u) G {0,5,1}. 
The Half-Integral (HI) relaxation has exactly the same formulation as the IP- 
formulation, but it allows variables x{u) from the set {0, 5, 1} for each u G V . 
Hence a feasible solution is a half-integral vertex cover for G, i.e., a function 
X : V ^ {0,5,1} satisfying edge constraints x{u) + x{v) > 1 for each edge 
{u, w} G E. Let VC*{G,w) be the set of all minimum half-integral vertex covers 
X : V ^ {0, 5, 1}, and vc*{G, w) stand for the weight of a minimum half-integral 




176 



M. Chlebik and J. Chlebfkova 



vertex cover for (G,w). For a minimum half-integral vertex cover x for (G,w), 
we denote Vf := {u&V : x{u) = i} for each t G {0, 1}. 

Clearly, vc*{G, w) < vc{G, w), as for any vertex cover G its indicator function 
x'^ is a feasible solution for the Hl-relaxed problem with w{x^) = w{G). Further, 
vc*{G,w) < ^w{V), as the function a; = | on C is always feasible solution for 
the Hl-relaxation. The (weighted) graphs G = (V, E) for which the equality 
vc*{G,w) = \w{V) holds play a special role in the Min-w-VC problem. All 
the difficulty solving the problem exactly, or approximating it, reduces to such 
graphs. 

Overview. The main goal of Section 2 is to provide the following strengthened 
version of Nemhauser- Trotter Theorem, its full version with additional properties 
is contained in Theorem 2. 

Optimal version of Nemhauser- Trotter Theorem. There exists a polyno- 
mial time algorithm that partitions vertex set V of any graph {G,w) with vertex 
weights w : V ^ (0, oo) into three subsets Vq, Vi, Vi with no edges between 
Vo and Vi or within Vo such that (i) vc{G\Vi],w) > ^w{Vi); and (ii) every 
minimum vertex cover G for (G, w) satisfies Vi C G C Ci U Vi and G fl Vi is a 
minimum vertex cover for {G\Vi],w). 

The main difference is that the condition (ii) will be satisfied for every min- 
imum vertex cover for (G,w), not merely for some of them. Such result was 
known only in unweighted bipartite case, when it follows from matching theory 
(Gallai-Edmonds structure theorem). 

The key point of this improvement is that one minimum half-integral vertex 

cover, called pivot, induces a decomposition V = VoUViUyi that has the desired 

additional quality. In this case the kernel Vi, to which the problem is reduced, 

is the largest among all Vf, y G VC*{G,w), in fact Vi = idy^vcriG w)Vf- It is 

2 ^ 2 ^ 2 

also crucial for applications that the pivot can be found efficiently. 

In Section 3 we provide useful decomposition of a weighted graph (G, w) 
that reflects the structure of the set of all optimal solutions for the Hl-relaxed 
problem. This decomposition into “irreducible parts” carries information how all 
minimum vertex covers for (G, w) are structured outside the minimal relevant 
kernel K := vc*(G,w)^i- As a simple byproduct we obtain a polynomial time 

algorithm that decides whether a minimum vertex cover has the same weight as 
the optimum of the Hl-relaxed (equivalently, LP-relaxed) problem, and if yes, 
finds one minimum vertex cover for (G,w). 

In Section 4 we demonstrate one possible application, namely how our im- 
provement of NT-Theorem can be used for fixed-parameter tractable problems 
related to the Min- V C problem (unweighted, for simplicity) . The strengthening 
of NT-Theorem can be used as efficient reduction to linear size problem kernel 
even in situations in which NT-Theorem is less efficient, e.g., if the task is to 
find all minimum vertex covers for G if vc{G) < k, or to report that vc{G) > k. 
Similarly, assuming vc{G) < k, to find a minimum vertex cover for G under some 
additional constraints. 
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2 The Optimal Version of Nemhauser-Trotter Theorem 

In the following series of lemmas we study the basic properties of minimum half- 
integral vertex covers. Recall that for a minimum half-integral vertex cover x for 
(G,w) Vf := {u €V : x(u) = i} for each i G {0, 1}. 

A minimum half-integral vertex cover x with the property that Vf is not a 

2 

proper subset of Rf for any y from V(7{G, w), is called a pivot. It plays a special 
2 

role in what follows. 

Lemma 1. Given a graph G = (V,E) with vertex weights w :V ^ (0, oo), and 

a partition V = VjT U Vf U VT according to a fixed minimum half-integral vertex 

2 

cover X for (G, w). Then 

(i) Vq is an independent set, T{Vq) C Vf , and the vertices of Vf \ T{Vq) 
have weight 0. 

(ii) If X is a pivot then Vq does not contain vertices of weight 0, and Vf = 

r{vs). 

(Hi) vc*{G\Vf],w) = \w(yi) 

(iv) For each U C Vf , w(G([/) fl Rg®) > w{U). If x is a pivot, then % ^ U QVf 
implies w{F{U) fl Vq) > w{U). 

Lemma 2. Given a graph G = (V,E) with vertex weights w :V ^ (0, oo), and 

a partition V = V^f U Vf U Vf according to a fixed minimum half-integral vertex 

2 

cover X for (G, w). Then 

(i) Vf is a minimum vertex cover for {G[Vq U V(f],w), hence vc{G\Vq U 
V(f],w) = w{Vf). If X is a pivot, then Vf is the unique minimum vertex 
cover for {G\Vq U Vf] , w) . 

(ii) x\v^^uvf is a minimum half-integral vertex cover for {G[Vq UVf],w), hence 

vc*{G\V(f U Vf],w) = w{V(f). If X is a pivot, then is the unique 

minimum half-integral vertex cover for {G[Vq U V(f],w). 

Lemma 3. Given a graph G = (V,E) with vertex weights w :V ^ (0, oo), and 

a partition V = VV U Vf U Vf according to a fixed minimum half-integral vertex 

2 

cover X for (G,w). Then the following holds: 

(i) Every (minimum) vertex cover for (G[Vf],w) together with Vf forms a 

2 

(minimum) vertex cover for (G,w). Every (minimum) half-integral vertex 

cover for {G\Vf],w) extended by 1 on Vf and by 0 on Vq forms a (mini- 
2 

mum) half-integral vertex cover for (G,w). 

(ii) For every minimum vertex cover C for (G, w), G D Vf and G fl {Vq U Vf) 

2 

are minimum vertex covers for {G\Vf],w) and {G[V(f U Vf],w), respec- 

2 

tively. For every minimum half-integral vertex cover y for {G,w), y\v^ 

2 

and «re minimum half-integral vertex covers for {G[Vf],w) and 

'^ 01 ' 2 

{G\Vq U Vf],w), respectively. 
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(Hi) If X is a pivot, then for every minimum vertex cover C for {G,w) it holds 

Vf C C C Vf U Vf. Analogously for every minimum half-integral vertex 
2 

cover y for (G,w) it holds j/jy® = 0 and = 1. 

In a bipartite graph a minimum vertex cover may be identified from the 
solution of the corresponding Minimum Cut problem. It can be found by ef- 
ficient algorithms for the Maximum Flow problem on bipartite graphs (see 
Lawler [11]). For instance, the problem is solvable in time 0(|if ||F| log 
using Goldberg and Tarjan’s algorithm [9]. When the problem is unweighted, 
Dime’s algorithm for the Maximum Flow problem runs in 0(|i?| a/|F |) time. 
Another approach in unweighted case is based on the bipartite graph matching 
theory. A maximum matching of a bipartite graph can be constructed in time 
0{\E\yJ\V\) by the algorithm of Hoperoft and Karp ([10]) (or even for general 
graphs by the algorithm of Micali and Varizani), and a minimum vertex cover 
for a bipartite graph can be constructed from a maximum matching in linear 
time. 

In what follows we define for a graph (G,w) its weighted bipartite version 
{G^,u/‘) and observe that the optimal solutions for the Hl-relaxation of the 
Min-ic-VC problem for (G,w) are generated by minimum weight covers for the 
corresponding bipartite graph (G^,w^). 

Definition 1. For a graph G = (V,E) with vertex weights w :V ^ (0,oo) we 
define weighted bipartite version {G^,w^), with G^ = (V^,E^), as follows: there 
are two copies and of each vertex u G V of the same weight w^{u^) = 
w\u^) = w{u) in {G\w^), V^:={u^ : u G V}, V’^:={u^ : u G V}, and 
V^:=V^ U V^. Each edge {u, v} G E of G creates two edges in G^, namely 
{u^,v^} and {v^,u^}. Hence E^:={{u^ ,v^}, {v^ ,u^} : {u,u} G E}. For U C 
V we use also U^, U^, and := U^UU^ for the corresponding sets of vertices. 

For any set C C U we associate a map xc '■ V — >■ {0, 1} in the 

following way: xc(u) = |lC'n{M'",M^}] for any u G V. Clearly w(xc) = ^w^(G) 
for any C CV^U F^. 

Lemma 4. (i) If G is a vertex cover for G^ then xc is a half-integral vertex 
cover for G of weight |w^(C'). In particular, vc*{G,w) < |?;c(G*', rc*'). 

(a) If X : V — >■ {0, 1} is a half-integral vertex cover for G then there is a 

vertex cover G for G^ such that xc = x. Hence \vc{G^ ,w^) < vc*{G,w). 
(Hi) vc*{G,w) = ^vc{G^,w^) 

(iv) The mapping G ^ xc maps VC{G^,w^) onto VC*{G,w). 

For a weighted graph (G, w) the existence of a pivot is clear from its definition. 
By Lemma l(ii), a pivot x is determined by its Vq part, as then Vf = F{Vq) 
and Ff = F\(Fg’®UF®). But Lemma 3(iii) implies that for every y G VC*{G,w) 

KT C Vg^ Vf C F/, and F| D F|, hence Fg" C ny^vcHG.w)V^ (C 1^), Cf C 
AyevCHG,w)Vf (C Vf), and Ff D yc*(G,t«)^| (2 Vf). Therefore the pivot 
X is unique and it defines the partition with Vq = Vq{G,w) := G\y^vc*(G,w) 




Improvement of Nemhauser- Trotter Theorem 179 



Vf = V,*(G,w) := nyevc^(G,n,)Vf, and Ff = Vl(G,w) := V\(Vo*(G,w) U 
V*(G,w)). 

Denote the analogous parts of V corresponding to the set VC(G, w) of mini- 
mum vertex covers as Vi{G, w), i G {0, 1, 5 }, where Vb(G, w) is the set of vertices 
avoided by each minimum vertex cover for (G,w), Vi{G,w) is the set of vertices 
contained in each minimum vertex cover for (G,w), and Vi{G,w) is the rest. 

Remark 1. For a fixed weighted graph (G,w) let (p : ^ denote the au- 
tomorphism of defined by <P{u^) = u^, <P{u^) = for each u G V. 

For a fixed u G V we obtain G Vq{G^,w^) iff G Vq{G^,w^) iff (using 
Lemma 4) m G KT(G, w); G Vi{G\w^) iff G Vi{G\w^) iff tt G V^*{G,w). 
In other words, for each i G {0, 1, |}, Vi{G^,w^) consists of pairs corresponding 
to vertices of V*{G,w). 

Hence the set Vq = Vq{G,w) for the pivot x can be identified from 
Vq(G*', 1 C*'), which can be computed efficiently due to the following lemma. Recall 
that for an unweighted bipartite graph it follows from the bipartite version of 
the classical Gallai-Edmonds Structure Theorem that the set Vq coincides with 
the set of vertices avoided by at least one maximum matching. 

Lemma 5. Let G = (V,E) be a bipartite graph with vertex weights w : V ^ 
(0,oo). The problem to find the set Vq, the set of all vertices in G that are 
avoided by each minimum vertex cover for (G, w), is solvable in polynomial time; 
in unweighted case in time 0{\E\^J\V\). 

Clearly, with the set Vq known, the set Vf = E{Vq) and Vf can be easily 

2 

constructed for the pivot x. Therefore, we obtain the following theorem 

Theorem 1. Let a graph G = (V,E) with vertex weights w : V ^ (0,oo) be 

given. Among minimum half-integral vertex covers for (G, w) there is exactly 

one pivot x. The corresponding partition V = VV U Vfi U Vf according to x has 

2 

the following properties: Vi = AyevC‘(G,w)Vi = G(V(f), 

and Vf = Uy^vc*{G,w)Vf ■ Moreover, there is a polynomial time algorithm that 

finds the pivot x. Ln the unweighted case, its running time is 0(|i?|A/|y|). 

Remark 2. If G = (V, E) is a bipartite graph with bipartition V = AUB and with 
vertex weights w : V ^ (0,oo), then (G^,w^) consists of two disjoint copies of 
(G,w), namely {G^[AA\JB^],w^) and (G^[H^Ui?'"], w^). Therefore vc{G^,w^) = 
2vc{G,w), and vc*{G,w) = vc{G,w) by Lemma 4(iii). Moreover, u G Vq(G,w) 
iff u^,u^ G Vq{G\w'^) iff m G Ro*(G,w), hence Vo(G,w) = V*{G,w). In the 
same way we get V\{G,w) = Vf{G,w). 

We can summarize our previous results as follows: 

Theorem 2. There exists a polynomial time algorithm that partitions the vertex 
set V of a given graph {G,w) with vertex weights w : V ^ (0, 00 ) into three 
subsets Vq, Vi, Vi with no edges between Vq and Vi or within Vq, such that 
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(i) vc{G[Vi],w) > vc*{G[Vi],w) = lw{Vi), 

(a) every minimum vertex cover G for {G,w) satisfies Vi C C C Vi U Vd and 
G nVi is a minimum vertex cover for (G[Vi],w), 

(Hi) every (minimum) vertex cover Ci for (G[Vi],w) together with Vi forms a 
(minimum) vertex cover for (G,w), 

(iv) Fo = , Vl = nyevc^(G,n,)V(' = F(Vo) , Vi = V \ (Vi U V 2 ), 

(v) % GVi implies w{Vq nr{U)) > w{U). 

In unweighted case its time complexity is 0(|£^|a/|T^|). Moreover, if G is 
bipartite, then Vq is the set of all vertices that are avoided by each minimum 
vertex cover for (G,w), Vi is the intersection of all minimum vertex covers 
for (G,w), and vc{G\Vi],w) = ^ru(yi). In particular, if G is an unweighted 
bipartite graph, then Vq is the set of vertices in G which are avoided by at least 
one maximum matching in G, and G[Vi] has a perfect matching. 

Lemma 1 shows in particular, that vc*{G,w) < ^w{V) iff Vq{G,w) 0 iff 
there is an independent set / in G such that w{r{I)) < w{I); and any of these 
conditions implies Vo{G,w) yf 0. For us it is interesting to have the structural 
characterization of instances (G,w) with vc*{G,w) = as Min-w-VC 

reduces to such instances. Namely, we have that vc*{G, w) = )^w(V) iff for every 
independent set / in G w{r{I)) > w{I) holds. 

In the next section we refine the NT-Theorem in another direction and the 
Min-ic-VC problem will be reduced to even more restricted instances (G,w), 
namely those with ru > 0 for which x = | on IL is the unique element of 
VC* (G,w). They can be characterized similarly, using Lemma 1, as those in- 
stances (G,w) for which w(T(/)) > w{I) holds for every nonempty independent 
set I in G. 



3 Decomposition into Irreducible Subgraphs 

The partition Vq, Vi, and Vi of the vertex set V from Theorem 2 satisfies Vi C 
G C kd U Vi for every minimum vertex cover G. In this case the problem kernel 

Vl is the largest among all Vf, y G VC*{G,w), in fact Vi = (G w)^t ■ 

2 2 ^ ' 2 
On the other hand, in the original NT-Theorem it is natural to search for such 

decomposition with Vi as small as possible. This is motivated by the fact that 
the MiN-rc-VC problem for (G,w) reduces to the one for (G[lA],w). In what 
follows we will see that one can find in polynomial time x € VC*(G, w) for which 
Vf is the smallest among all Vf, y G VC*{G,w), namely Vf = n„gi/C*(G w)Vi ■ 

Furthermore, for the problem kernel {G\Vf\,w), z = f on Vf is the unique 

element of VC*{G\Vf],w). Also we assume from now on that w > 0, as vertices 
2 

with weight 0 are not contained in Hy^vG* ( 0 , 111)^1 ■ 

The following theorem summarizes the main results of this section. 




Improvement of Nemhauser- Trotter Theorem 181 



Theorem 3. There exists a polynomial time algorithm (running in time 
0(|i?|A/|y|) in unweighted case) that for a graph G = (V, E) with vertex weights 
w :V ^ (0, oo) and vc*(G,w) = ^w{V) constructs a partition 

V = k\Juut,\Juus, 
with the following properties: 

(i) there is a minimum half-inteqral vertex cover x for (G,w) such that = 

Vf = and = K, 

2 

(a) K = r\y^yc*(G,w)yi ■ Moreover, if K ^ z = ^ on K is the unique 

element of VC*{G[K],w) and vc{G[K],w) > vc*{G[K],w) = ^w(K). 

(Hi) For each i G |1, 2, . . . , s| the followinq holds true: 

(a) S. = r{T,)\U)rJ,S,, 

(b) w{Ti) = w{Si) = vc{G[T^ US'i],'u;), 

(c) ^ ^ T C-Ti implies w{F{T) fl Si) > w{T), 

(d) for every G € VC(G,w) G D (Ti U Si) is either Ti or Si, and if Si is 
not an independent set then C fl (T^ U Si) = Si. 

Remark 3. Under the assumptions of Theorem 3, we have Vq{G,w) = 
Vf{G, w) = 0 and we cannot say much, in general, about Vo{G, w) and Vi(G, w). 
But in many cases the theorem gives us nontrivial information about Vq{G,w) 
and Vi{G,w). Let the corresponding partition V = iL IJ U be 

fixed. We will say that i £ {1, 2, . . . , s} is determined if for every G £ VG(G, w), 
G n (Ti U Si) = Si (i.e.. Si C C and T^ n G = 0). By part (iii)(d) of Theorem 3 
we know that any i for which Si is not an independent set is determined. Also, 
if i is such that for some k > i there exists an edge between Si and Sk as well 
as an edge between Si and T^, then i is determined. Further, if i is determined 
and j < i \s such that there exists an edge between Ti and Sj, then j is de- 
termined as well. These observations allow in some cases to further reduce the 
kernel (G[Ui],rc) obtained in Theorem 2, as we can tell a priori for some i that 
Ti C Vo(G,w) and Si C Vi{G,w). 

To explain some ideas behind the proof of Theorem 3, we study the Min- 
w-YC problem on weighted bipartite graphs with bipartition V = LU R, and 
satisfying vc{G,w) = ^w{V). In this setting an edge {u,v} £ E is called allowed 
for (G, w) if for every minimum vertex cover G for (G, w) only one of the vertices 
u and v belongs to G, otherwise it is called forbidden. Further, (G, w) is called 
elementary if it has exactly two minimum vertex covers, namely L and R. Clearly, 
if (G, w) is elementary then G is connected and every edge of G is allowed. The 
notions of an allowed edge and an elementary graph come from decomposition 
theorems related to maximum matchings in unweighted graphs. See [12, Thm. 
4.1.1] for the proof that our notions are equivalent in case of unweighted bipartite 
graphs with perfect matching. 

We need also to prove the following generalization to weighted graphs of 
the classical Dulmage-Mendelsohn Decomposition Theorem for bipartite graphs, 
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where we are focused on minimum vertex covers rather than on maximum match- 
ings. 

Theorem 4. Let G = (V, E) be a bipartite graph with bipartition V = L\JR and 
vertex weight w : V ^ (0,oo). Assume that vc{G,w) = \w{V). The subgraph 
of G consisting of all allowed edges for (G, w) has components, called blocks, 
Bi = G[Li U Ri] for i = l,2,...,r (here = L and LI(^iRi = R are 

partitions). Each weighted block (Bi,w) is an elementary graph, and the ordering 
Bi, B2, ■ ■ ■ , Br can be chosen with the following property: every edge in G 
between two blocks Bi and Bj with i < j must have its R-vertex in Bi and L- 
vertex in Bj. The decomposition into blocks and their admissible ordering can be 
constructed in polynomial time; in unweighted case in time 0(|if|A/|y|). 



Remark ). In Theorem 4 L and R are always in VC{G,w), but if (G,w) is not 
elementary there are also “intermediate” minimum vertex covers. Namely, each 
C'i := Ufc=i -Rfe Lfc, t = 0, 1, ... ,r, belongs to VC{G,w). 

Let us mention some of the consequences of Theorem 3. For a graph 
G = (V,E) with vertex weights w : V ^ (0,oo) we firstly apply Theorem 2 
to obtain (G[yi],w) whose positive weighted vertices satisfy the assumption 
of Theorem 3. This reduces the Min-w-VC problem for (G,w) to the one for 
{G[K],w), for which x = ^ on iF is the unique element of VG*{G[K],w). 
Moreover, the difference vc{G,w) — vc*{G,w) is preserved. It is the same as 
vc{G[K],w) — VC* {G\K],w), which is zero iff iF = 0. Hence we have the following 

Corollary 1. There is a polynomial time algorithm (of time complexity 
0(|if|-\/|y|) in unweighted case) that for a graph G = (V, E) with vertex weights 
w : V ^ (0,oo) decides whether vc{G,w) = vc*{G,w), and if the equality holds, 
finds one minimum vertex cover for (G, w) . 



Remark 5. Since a (maximum) independent set for (G, w) is a complement of a 
(minimum) vertex cover for (G, w), all results above can be translated in obvious 
way to the ones for the Maximum Weighted Independent Set problem. 

To describe some of possible applications, we will confine ourselves to the 
unweighted version of the Minimum Vertex Cover problem in the rest of the 
paper. 

4 Parametrized Complexity and Vertex Covers 

The Minimum Vertex Cover problem and its variants play a very special role 
among fixed-parameter tractable problems. Let us recall the basic parametrized 
version of the problem: 

Instance: A graph G = (V, E) and a nonnegative integer k 




Improvement of Nemhauser- Trotter Theorem 183 



Question (for decision version): Is there a vertex cover for G with at most k 
vertices? 

Task (for search version): Either find a vertex cover for G with at most k vertices 
or report that no such vertex cover exists. 

Recently, there have been increasing interest and progress in lowering the ex- 
ponential running time of algorithms that solve NP-hard optimization problems, 
like Min-VC, precisely. One of the most important methods employed in the de- 
velopment of efficient parametrized algorithms for such problems is reduction to a 
problem kernel. For the parametrized decision version of the vertex cover problem 
it means to apply an efficient preprocessing on the instance (G, k) to construct 
another instance (Gi,fci), where G\ is a subgraph of G {the kernel), k\ < k, 
and Gi has a vertex cover with at most ki vertices iff G has a vertex cover with 
at most k vertices. As observed in [5], the Nemhauser-Trotter Theorem allows 
to find efficiently a linear size problem kernel for Min-VC. Namely, there is an 
algorithm of running time 0{k\V\ + k^) that, given an instance (G = {V, E), k), 
constructs another instance (G' = {V ,E'),k') with the following properties: G' 
is an induced subgraph of G, \V'\ < 2k' , k' < k, and G admits a vertex cover of 
size k iff G' admits a vertex cover of size k' . Clearly, using the same technique 
one can solve the parametrized search version of the vertex cover problem, or 
the problem: to find a minimum vertex cover of G if vc(G) < k, or report that 
vc{G) > k. 

Unlike the Nemhauser-Trotter Theorem, Theorem 2 can be used as efficient 
reduction to linear size problem kernel for the following problem: to find all 
minimum vertex covers if vc(G) < k or report that vc(G) > k. 

Parametrized All-Min-VC problem 
Instance: (G = (V, E), k) and a nonnegative integer k 

Task: Either find all minimum vertex covers for G if vc{G) < k, or report that 
vc{G) > k. 

Theorem 5. There is an algorithm of running time 0{k\V\ + k^) that for a 
given instance (G = (U, E), k) either reports that vc(G) > k, or finds a partition 
V = N U Y U V such that G' := G\V'], k' := k — \Y\, vc{G') > ^\V'\, and 
\V'\ < 2k'. Moreover, vc(G) < k iff vc{G') < k' , and assuming vc(G) < k: 

(i) for every minimum vertex cover G' for G' : G' UY G VC{G), and 

(ii) for every minimum vertex cover G for G: Y C G C V U U' and C C\V' G 
VC{G'). 

Proof. Let an instance (G = (V,E),k) be given. Clearly, every vertex v GV oi 
degree at least fc-l-1 has to belong to every vertex cover of size at most k, provided 
vc{G) < k. Denote Y", the set of vertices of G of degree at least (fc-l- 1), N", the 
set of isolated vertices of G\Y", V" := V\{Y"UN"), and k" = k—\Y"\. Firstly, 
in running time 0{k\V\) we can construct a graph G" = {V" , E") := G[U"] (see, 
e.g.. Buss [2] for such simple algorithm). Clearly, vc{G) < fc iff vc{G") < k" , and 
assuming vc{G) < k: (i) for every G" € VC{G"), G" U Y" G VC(G), and (ii) for 
every G G VC(G), Y” C G C Y" U V" and G n U" G VC{G"). 
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Each vertex of G" has degree at most k. Hence vc{G”) < k” is only possible 
if \E"\ < kk” . If \E”\ > kk” ^ we can report that vc{G) > k and the algorithm 
terminates. Otherwise, we have \E”\ < k ■ k” (< fc^), and since G” does not 
contain isolated vertices, it follows that \V”\ < 2\E"\ < 2fc^. Now we apply 
Theorem 2 to the graph G" (with w = 1). Namely, we partition the vertex set 

V into three subsets Vq, Vi, Vi in time 0{\E”\yJ\V"\) = 0{k^). Further, we put 

Y := Y"UVi, N := N"UVq, V' := Vi,G' := G[V'], and k' := k"-\Vi\ = k-\Y\. 

Obviously, vc{G) < fc iff vc(G') < k' , and from Theorem 2 also vc{G') > WV'\. 
It means if \V'\ > 2k', we can report that vc(G') > k', hence vc{G) > k, and 
the algorithm terminates. Otherwise \V'\ < 2k' holds, as was required. All other 
properties follow directly from Theorem 2. □ 

Theorem 5 can be used to many other parametrized problems related to 
Min- VC as reduction to linear size problem kernel. The typical example is the 
problem, whose task is to find one minimum vertex cover for G under some 
additional constraints. 

Parametrized Constrained-Min-VC problem 

Instance: (G = (V, E),k), k a nonnegative integer, and finitely many linear con- 
straints Pi, P 2 , ..., Pr of the form ai(v)x(v) < bi, i = 1,2, ... ,r, 

where ai(v), bi G M. 

Task: If vc{G) < k find G from VC{G), whose indicator function x = satisfies 
all constraints P\, P 2 , ■ . ■ , Pr, otherwise report that no such minimum vertex 
cover exists. 

The most natural case is when each ai{v) is either 0 or 1, and bi are 
nonnegative integers. Then constraint Pi says, that |G fl < bi for a set 
Ai := {v G V : ai{v) = 1} and for a vertex cover G G VC{G) to be found. The 
problem has received considerable attention even in its very simplified version, 
when G = (V,E) is a bipartite graph with bipartition (L,R), and two nonneg- 
ative integers ki, and /cr (with k = fen + ^r) are given as an input. The /cr 
and kn represent constraints |G fl L| < |G fl i?| < /cr on G G VC{G) to be 

found. This problem arises from the extensively studied fault coverage problem 
for reconfigurable memory arrays in VLSI design, see [5] and references therein. 

Theorem 5 clearly allows efficient reduction to the linear size problem ker- 
nel for Parametrized Constrained Min-VC. Namely, (G = (V,E),k) with 
constraints Pi : ai{v)x{v) < bi, i = 1,2, ...,r is reduced using The- 

orem 5 to (G' = (y' , E'),k') with \V'\ < 2k' (< 2k), and with constraints 
PI ■ ai(^)a;(i’) < K (:= a*(w)), i = 1, 2, . . . , r. 



Further Research 

Seemingly a new technique, called the crown reduction (also crown decomposi- 
tion, crown rules) has been recently introduced in [6] and [1] for the (unweighted) 
Vertex Cover problem. A crown decomposition in a graph G = {V,E) is a 
partitioning of the vertex set V into three sets Vq, Vi, and Vi satisfying the 
following conditions: (1) Vg (the crown) is an independent set; (2) Vi (the head) 
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separates Vq from Vi (the rest), i.e., there are no edges between Vq and Vi] and 
(3) there is a matching of |Vi| edges from Vi to Vo- 
lt can be seen from Lemma 1 that any minimum half-integral vertex cover x 
defines such partition (as (3) easily follows from Lemma l(iv)). Hence we not only 
generalize NT Theorem, but also link this well studied theorem very close to the 
crown reduction technique. From this link and our theorems it also easily follows 
that if a graph admits a crown decomposition, then a crown decomposition can 
be computed in polynomial time. Applying reduction rules of Theorem 2 and 
Theorem 3 one can obtain an irreducible instance in which 1/^(7) | > |/| for every 
nonempty independent set I. 

Let us mention that the crown reduction technique can be applied effectively 
to other parametrized problems (see [3]). Moreover, our new decomposition the- 
orems for vertex covers have connections with “parametrized enumeration” in 
the sense of listing all minimal solutions, as also discussed in [8]. We believe 
that the technique of this paper may be a powerful tool to designing algorithms 
for other fixed parameter tractable problems that are related to the Min- VC 
problem. 
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Abstract. The first polynomial time algorithm {0{n‘^)) for modular de- 
composition appeared in 1972 [8] and since then there have been incre- 
mental improvements, eventually resulting in linear-time algorithms [22, 
7,23,9]. Although having optimal time complexity these algorithms are 
quite complicated and difficult to implement. In this paper we present 
an easily implementable linear-time algorithm for modular decomposi- 
tion. This algorithm uses the notion of factorizing permutation and a 
new data-structure, the Ordered Chain Partitions. 



1 Introduction 

The notion of module naturally arises from different combinatorial structures [26] 
and appears under the names of autonomous sets, homogeneous sets, intervals, 
partitive sets, clans, etc. Modular decomposition is often the first algorithmic 
step for many graph problems including recognition, decision and optimization 
problems. Indeed, it plays an important role in various graph recognition al- 
gorithms (eg. cographs [5], interval graphs [25], permutation graphs [29] and 
other classes of perfect graphs [12,2]), and in the transitive orientation problem 
(see [11,23]). The interested reader should refer to [26] for a survey on modular 
decomposition. 

For a few years, linear-time algorithms have been known to exist ([22,7,23, 
9]) but remain still rather complicated. Therefore in the late 90’s, a series of 
authors attempted to design practical modular decomposition algorithms, even 
quasi-linear. In [24], an 0{n + mlogn) algorithm was proposed while [16,9] got 
an 0{n + ma{n, m)) complexity bound (where a(n, m) is the inverse Ackermann 
function). Such phenomena in the algorithmic progress for a given problem is 
quite common. For example the first linear-time algorithm for the interval graph 
recognition problem appeared in 1976 [1]. This algorithm uses sophisticated data- 
structures, namely the PQ-trees. Since then, successive simplifications have been 
proposed [20,6,14]. One can also refer to planarity. The first linear-time planarity 
testing algorithms that appear in the early 70’s [19,1] are rather complicated. 
Simpler algorithms have later been designed. Designing optimal but simple al- 
gorithms is a great algorithmic challenge. It was still an open problem to design 

* For a full version of this extended abstract, see [15] 
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a very simple linear-time modular decomposition algorithm. We propose one 
(depicted in Figure 7) in this paper. 

Any graph G = (V, E) considered here will be simple and undirected, with 
n = \V\ vertices and m = \E\ edges. The complement of a graph G is denoted 
by G. If A is a subset of vertices, then G[A] is the subgraph of G induced by 
X. Let X be an arbitrary vertex, then N(x) and N(x) stand respectively for 
the neighborhood of x and its non-neighborhood. A vertex x distinguishes two 
vertices u and v iff {x, u) G E and (x, v) ^ E. A module M of a graph G is a set 
of vertices that is not distinguished by any vertex. 

The modules of a graph are a potentially exponential-sized family. However, 
the sub-family of strong modules, the modules that overlap^ no other module, 
has size 0(n). The inclusion order of this family defines the modular tree de- 
composition, which is enough to store the module family of a graph [26]. The 
root of this tree is the trivial module V and its n leaves are the trivial mod- 
ules {x},x G V. It is well-known that any graph G with at least three vertices 




Fig. 1. A graph and its modular tree decomposition. The set {1, 2} is a strong module. 
The module {7,8} is weak: it is overlapped by the module {8,9}. The permutation 
a = (1, 2, 3, 4, 5, 6, 7, 8, 9) is a modular factorizing permutation. 



either is not connected (G is obtained from a parallel composition of its con- 
nected components); or its complement G is not connected (G is obtained from 
a series composition of the connected components of G); or G and G are both 
connected. In the last case, the maximal (with respect to inclusion) modules 
define a partition of the vertex-set and are said to be a prime composition. It 
follows that the modular decomposition tree can be recursively built by a top- 
down approach: at each step, the algorithm recurses on graphs induced by the 
maximal strong modules. Such a technique gives an 0{n'^) algorithm in [8], the 
first polynomial-time algorithm of a list that counts dozens of them (see [27]). 

The idea of modular factorizing permutation was introduced in [3] : a permu- 
tation cr of the vertices of the graph such that, for each strong module M, the 
reverse image is an interval of N. It is clear that a DFS on the modular 

tree decomposition orders the leaves as a modular factorizing permutation. Con- 
versely, [4] proposed a simple and linear-time algorithm that, given a graph and 
one of its factorizing permutations, computes the modular decomposition tree. 
The idea is to approximate the smallest strong module containing two vertices 



1 A overlaps B if An B ^ 0, A\ B ^ 0 and B \ A ^ 0 
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X and y using the interval of a;, y-distinguishers^. This algorithm first computes 
a bracketing of the factorizing permutation: an opening parenthese is written 
before the first x, y-distinguisher, and a closing parenthese after the last, for 
all X and y that follow consecutively in cr. This bracketing defines a tree, and 
with a few node operations, one can produce the modular decomposition tree 
(in other words, a factorizing permutation can be seen as the compression of the 
modular tree decomposition into a linear structure). It follows that the modu- 
lar decomposition problem reduces to the computation of a modular factorizing 
permutation. In some cases such a permutation is given for free. In the case of 
chordal graphs, any Cardinality Lexicographic BFS yields a modular factoriz- 
ing permutation [20]. [10,13] used a similar notion to decompose an inheritance 
graph into blocks or modules. Recently, it has been shown for some families 
of intersection graphs (namely interval graphs and permutation graphs), whose 
intersection model requires 0{n) space, a factorizing permutation can be easily 
retrieved from the model yielding an 0(n) algorithm that computes the modular 
tree decomposition (see [27] or [21] for similar results). 

We propose here the first linear-time algorithm that computes a modular 
factorizing permutation without computing the underlying decomposition tree. 
This algorithm, combined with the one of [4], is therefore a simple linear-time 
modular decomposition algorithm, in two steps (first the modular factorizing per- 
mutation, then the modular decomposition tree). Using a new data-structure, 
the Ordered Chain Partitions, we reduce the complexity to linear-time, as ex- 
plained below. Avoiding in a first step the computation of the decomposition 
tree provides a real simplification of the modular decomposition algorithm. In- 
deed, as in [16], easy vertex partitioning rules are used. An implementation of 
this algorithm is available at http://www.lirmm.fr/~montgolfier/algos/ 

2 Module-Factorizing Orders 

Let G = {V, E) be a graph and let O be a partial order on V. For two comparable 
elements x and y where x^oy^^ state x precedes y and y follows x. Two subsets 
A and B cross if 3a, a' G A and 3b, b' G B such that a b and a' >~o b'. A 
linear extension of a poset is a completion of the poset into a total order. 

Definition 1. A partial order O is a Module-Factorizing Partial Order 
(MFPO) of V{G) if any pair of non-intersecting strong modules of G do not 
cross. 

The modular factorizing permutations (hereafter factorizing permutation for 
short) are exactly the module-factorizing total orders. 

Proposition 1. A partial order O is an MFPO if and only if it can be completed 
into a factorizing permutation. 

^ an X, j/-distinguisher has exactly one neighbor in {x, y}, therefore belongs to the 
smallest strong modnle containing x and y 
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Our algorithm starts with the trivial partial order (with no comparison be- 
tween any pair of vertices), which is an MFPO. Then the order is extended with 
new comparisons between vertices. When the order is total, the process stops. 
Proving that the extensions preserve its module-factorizing property shows the 
correctness of the algorithm, namely that the final order is a factorizing permu- 
tation. 



3 Towards a Linear-Time Algorithm 



In [16], an 0(n -I- mlogn) algorithm, based on partition refinement techniques 
[28], was proposed to compute a modular factorizing permutation. This algo- 
rithm uses a restricted class of MFPO: the ordered partitions [28]. They are 
easy to handle, using a simple implementation, where most operations can be 
performed in 0(1) time. This section describes the main techniques of [16], also 
used by our algorithm. 

Definition 2. An ordered partition is a collection {Vi , . . - Vk} of pairwise dis- 
joint parts, with V = Pi W . . . W Pfc, and an order O such that for all x € Vi and 
y G Vj, x<oV iff i < j- 

The algorithm of [16] starts with the trivial partition (a single part equal to 
the vertex set) and iteratively extends (or refines) it until every part is a sin- 
gleton. A center vertex c G P is distinguished and two refining rules, preserving 
the MFPO property, are used. These rules are defined by Lemma 1. 

Lemma 1. [16] 

1. Center Rule: For any vertex c, the ordered partition N{c) l±l {c} l±l N{c) is 
module- factorizing. 

2. Pivot Rule: Let O = Vi'S . . .'S {c} S . . .SVk be an ordered partition with 
center c and letp G Vi such that Vj, i yf j, overlaps N{p). IfO is an MFPO, 
then the following refinements preserve the module-factorizing property: 

if'h’i^oVj<o{c} or {c}^oVj<oVi, then replace Vj by {N{p)r\Vj)S 
(N{p) C\Vj) (in that order), 

b) otherwise replace Vj by {N{p) O Vj) W {N{p) O Vj) (in that order). 



The center rule 

N[c) c JV(c) 




The pivot rule, case (a) 
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Fig. 2. The refinement rules defined in [16]. 
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The center rule picks a center and breaks a trivial partition to start the 
algorithm. Once launched, the process goes on based on the pivot rule, that 
splits each part Vj (excepted the part Vi that contains the pivot), according 
to the neighborhood of the pivot. When the algorithm of Figure 3 ends, every 
part is a module. To obtain a factorizing permutation, it has to be recursively 
relaunched on the non-singleton parts. The complexity issues depend on the 
choice of the part V (1.4 of Algorithm of Figure 3). Using Hopcroft’s rule [18], 
[16] achieves an 0{mlogn) time-complexity. 



Refine(G,C) = {U}) 

1. Pick a center c 

2. Extend O using the center rule with c 

3. Repeat 

4. Select a part V 

5. For each p G V Do 

6. Extend O using the pivot rule with p 

7. until no pivot rule can extend O any more 



Fig. 3. Partition refinement scheme of [16]. It outputs a partition of V into the maximal 
modules not containing c. 



4 Ordered Chain Partition and Linear-Time Algorithm 

To improve the complexity down to linear-time, our algorithm uses each vertex 
a constant number of times as a pivot. This algorithm is depicted in Figure 7. 
An execution example is presented Figure 10. 

Definition 3. An ordered chain partition (OCP) is a partial order such that 
each vertex belongs to one and only one chain, and one chain belongs to one and 
only one part. The vertices of the same chain are totally ordered, the chains of 
the same part are uncomparable, and the parts are totally ordered. 



Parts Chains 




Fig. 4. An Ordered Chain Partition. 



A trivial chain contains only one vertex, and a monochain part contains 
only one chain. The OCPs generalize the Ordered Partitions since the latter 
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ones contain only trivial chains. C{x) will denote the chain containing x while 
V{x) will denote the part of the partition containing x. Each chain C has a 
representative vertex r(C) € C. During the algorithm, the chains will behave 
as their representative vertex (the tests for the refinement rules are done on 
the representatives). Notice that chains are possibly merged. In that case, the 
representative of the new chain is one of the former representatives (indeed it 
will be the old center). But chains will never be split. 

The algorithm still uses the center rule and the pivot rule (see Lemma 1). 
The chains are moved by these two rules, according to the adjacency between 
their representative vertex and the center or the pivot. There is a third rule, 
the chaining rule (line 9 of algorithm of figure 7). Unlike the two first ones, 
this third rule removes comparisons from the order. This rule first concatenates 
a sequence of monochain parts, that occur consecutively in O, into one chain. 
Then this new chain is inserted into one of the two parts, say V, neighboring the 
chain (see Figure 5). The comparisons between the chain and V are lost. But 
since the number of chains strictly decreases during the algorithm, the process 
is guaranteed to end. Finally the following invariant is satisfied. 

Before 



After 



Fig. 5. The chaining rule, chaining the black vertices into V. 

Invariant 1. The ordered chain partition O is an MFPO ofV{G) and no chain 
is overlapped by a strong module. 
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Fig. 6. Position of a strong module M (black vertices) in O (Invariant 1). 



To use any vertex 0(1) times as a pivot, the algorithm picks only one vertex 
per part to extend the OCP (instead of all the vertices of the part as [16] did). A 
chain C is used if its representative vertex r(C) has already been used as pivot by 
some extension rule. Similarly a part is used if it contains a used chain. Pivots 
may be chosen from unused parts only: this ensures each vertex neighborhood 
is used 0(1) time. Unlike the algorithm of Figure 3, when all parts are used, 
the non-trivial (multichain) parts are not necessarily modules. The algorithm 
chooses a new center and recurses (see line 12 of Algorithm of Figure 7). 

Choice of the new center, (line 8 of Algorithm of Figure 7) As already seen in 
the algorithm of [16], the center plays an important role (see Lemma 1). The rule 
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Refine(G,Cl,[[i, j]],c) denotes the working factor of O, see below */ 

1. Split p, ji]] using the c center rule /* in which the pivots are chosen */ 

2. While some multichain part in p, exists Do 

3. While there is an unused part in p,j| Do 

4. Select an unused part V C p,il S'lid a chain C £V 

5. Extend O using the pivot rule with p = r{C) 

6. End of while 

7. If some multichain part in p, j] exists Then 

8. Find the multichain part V' of the new center c„ 

9. Create the new chain S containing c and Cn using the chaining rule 

10. Add S to V' 

11. Extend O using the pivot rule with c„ 

12. Refine(G,C>,iP(c„),c„) 

13. End of if 

14. End of while 



Fig. 7. Linear-time algorithm. 



described below was already defined and proved in the context of cographs [17]. 
Indeed the following invariant is the basis of the correctness proof: 

Invariant 2. Let M he a strong module and c he the center. Then either c 
belongs to M , or M consists in consecutive monochain parts, or M is included 
in a single part V. 

The new center c„ must fulfill Invariant 2, as the old center c did. If all the 
strong modules containing c but not c„ are included in V{c), then Invariant 2 
holds. Let Vp (resp. Vr) be the rightmost (resp. leftmost) multichain part that 
precedes (resp. follows) c. As both parts are used, their pivots pp and pr are 
defined. One of them is chosen for the recursive call, and its pivot becomes the 
new center. Only one pivot among pp and pr distinguishes the other from the 
center c. The rule chooses that pivot (wlog. say pp see Figure 8.b) as new center. 
A simple adjacency test between pp and pr is enough to implement that choice. 
Assume the other choice is made: ie. pp distinguishes c and pR, and moreover 
Pr has been chosen as new center. There could exist a module containing c and 
Pl but not Pr: such a module would violate Invariant 2, a contradiction. 



a) 



Pl c Pr 





Fig. 8. In case a), the new center is pr, in case b) pl is chosen. 
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Center and pivot rules modified. The pivot rule works as described in 
Lemma 1 (rule 2). The center rule should be handled carefully. It breaks V{c), 
the part containing the new center c, into three parts (see Lemma 1, rule 1). The 
case where P(c) nfV(c) or P(c) nfV(c) is empty could hinder the algorithm since 
the number of parts does not increase (a similar problem was observed in [17] 
for the cograph recognition). This case occurs when c and the previous center 
Cp are both adjacent (respectively nonadjacent) to the other representatives of 
V{c). It can be shown that V{c) has at least three chains. Therefore if C(cp), 
the chain containing the old center, is put in that empty part, cycling is avoided 
and the module- factorizing property is still valid. 



The chaining rule, (line 9, Algorithm of Figure 7) When a new center c„ 
is chosen there are only monochain parts between c„ and c. The chain to con- 
catenate with V{c„), using the chaining rule, starts from V{cn), contains c and 
extends until a certain chain C{a) that is contained in a monochain part. Wlog. 
assume that c„ ^ c. Let (1) be the property that a part V in the working factor 
liJl fulfills: 



V ho 'Pic) is a monochain part {C} and (r(C),c„) ^ E (1) 

Let P' be the leftmost part (wrt. O) that violates (1) and such that any 
part between P(c) and P' satisfies (1). Since P(c) fullfills (1), part P' exists. 
The chain S to concatenate with P(c„) starts from the part that follows P{cn) 
and extends until every part up to but not including V' . Lemma 2, required in 
the proof of Theorem 1, ensures that the strong modules containing c but not 
c„ are included in P(c„). Invariant 2 will be fulfilled, and the algorithm can be 
relaunched. 

Lemma 2. Let P{c) he the part resulting from the concatenation ofP{cn) and 
S by the chaining rule. Every module containing c but not Cn is included in P{c). 

Notice that the part P{c) now contains two used chains, C(c) and C{cn). 
But the center rule, at the next recursive call, will distinguish them. Then the 
invariant property (useful for proving time complexity) that every part contains 
at most one used chain, holds. 



The bad pivots problem, and the working factor. In [16], it has been 
shown that the refinement rules (center and pivot rule, see Lemma 1) can be 
applied as long as any pivot that precedes the center c is non-adjacent with it, 
while a pivot that follows the center is one of its neighbors. In the new algorithm, 
the choice of a new pivot could hinder that property. A vertex x G P is said to 
be bad if: 



either x c and x G N{c); or x )~o c and x G N{c) 

A chain is bad if its representative is bad; and a part is bad if it contains a 
bad part. Notice that the choice of pivot is restricted to a working factor. But 
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any refinement rule applies on any part (even those that do not belong to the 
working factor). The vertices of the working factor are those that are in V(c) 
when c becomes the new center. They are exactly the scope of the center rule 
that is used with c. Even after some split of V{c), it remains a factor of O. It 
follows that the working factor contains no bad vertex wrt. the current center. 
The working factor is denoted [[*, jj, where i and j are two integers such that, 
for any linear extension a of O, x is in the working factor iff t < a(x) < j. 

The following invariant shows that the bad parts are “almost” modules (in- 
deed they are the union of some strong modules) and explains the role of the 
working factor. 

Invariant 3. 

1. Let X € V be a bad vertex. If a part V is bad, then all of its chains are bad, 
no strong module overlaps V and V IT p, jj = 0 . 

2. Let p, j]] be the working factor and c the center. No part overlaps p,j]]. If a 
strong module M overlaps p, jj then c € M. 

Line 11 of Algorithm of Figure 7 uses the new center once more as a pivot 
in order to avoid the existence of bad chain in the incoming working factor. It 
is worth to remark that the working factors are nested. Moreover the working 
factor returned by any recursive call only contains monochain parts (a total 
order on its vertices). As the whole vertex-set is the working factor of the main 
(initial) call, when it ends, V is linearly ordered in a factorizing permutation. 
Thus we have: 

Theorem 1. The algorithm of Figure 1 computes a factorizing permutation of 
a graph G. 



5 Linear-Time Implementation 

The algorithm described above can be implemented to run in 0(n + m) time. It 
uses a simple implementation of the OCP presented below. The main points of 
the complexity analysis are explained. 



Ordered Chain Partition, (see Figure 9) The parts of the OCP form a doubly 
linked list. A part itself has a doubly linked list of its representatives vertices. 
The order inside this list does not matter. The part of a representative vertex x 
is explicitly mentioned using a field Part[x]. Finally, each representative vertex 
points to its part, and maintains an ordered list of the chain C{x), with pointers 
towards heads and tails. The concatenation of two chains can thus be performed 
in 0(1) time. 
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Fig. 9. Implementation of an Ordered Chain Partition. 



Implementation of the Refine Procedure. First, to choose a pivot in a 
given part, the algorithm simply selects the head of this list (line 4, Algorithm 
of Figure 7). Moreover, the choice of an unused part (line 4) can be done in 
0(1) time, within the working factor. Indeed it suffices to manage one stack per 
recursive call that contains such parts. Each time a new unused part is created 
(when a part is split), it is pushed on the corresponding stack. Finally, a search 
of the working factor from the part containing the current center will find, if 
they exist, the parts Vl and Vr (line 8, Algorithm of Figure 7). If the working 
factor is completely visited, then the recursion stops since no such part exists. 

From the above discussion and Lemma 3, it follows that the overall running 
time of the recursive calls, apart from the time spent by the refinement rule, is 
0(n). 

Lemma 3. A given vertex can he used at most once as center and twice to 
extend the OCP. 

Implementation of refinement rules. As in [16], the center rule (first line 
of the algorithm of Figure 7) and the pivot rule (fifth line) can be processed in 
0(|A^(a;)|) time, where x is either the center or the pivot. For the chaining rule 
(line 9), the “closest” part that does not satisfy Property 1 should be found. 
A search in the list of parts is necessary and an adjacency test should be done 
between the new center and the pivot of the current part. It is possible to show 
that these tests can be done in 0(1) amortized time. 

Theorem 2. Given a graph G = {V,E), the time hound of the algorithm of 
Figure 1 is 0{n + m). 

Acknowledgment. The authors would like to thank D. Corneil, T. Erlebach 
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work. 
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An Execution Example 

Vertex x is selected as the first center. 




1. z is used as pivot and splits [a,s,y] 



2. y is used as pivot and splits [z,t,u,v,w] 



3. a and u are used, but do not refine anything 
Among u and a, u is chosen as new center 

(using the adjacency test) 

X, y, z are chained and u is the new center 

4. the working factor is [v...t]. u is the new center, 
t is used as pivot and splits [a,s] and [v,w] 

Part {a,s} outside the working factor is split. 

5. V is used as pivot and splits [y-x-z,t] 



Every part of the working factor is monochain 
The parts are linked. The center changes to a 
and the process stops since a is independent. 




Fig. 10. The resulting factorizing permutation is a, s, v, w, u, y, x, z, t 
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Abstract. We consider delay management in railway systems. Given 
delayed trains, we want to find a waiting policy for the connecting trains 
minimizing the weighted total passenger delay. If there is a single de- 
layed train and passengers transfer at most twice along fixed routes, or 
if the railway network has a tree structure, the problem can be solved by 
reduction to min-cut problems. For delayed passenger flows on a railway 
network with a path structure, the problem can be solved to optimality 
by dynamic programming. If passengers are allowed to adapt their route 
to the waiting policy, the decision problem is strongly A/”P-complete. 



1 Introduction 

In recent years, there has been an increasing focus on delays in railway systems, 
and many railway companies focus on reducing delays in an effort to increase cus- 
tomer satisfaction. The possibility that one train gets delayed is always present. 
In order to allow passengers to transfer from a delayed feeder train, it can be 
beneficial to deliberately delay a connecting train. This can save delayed pas- 
sengers from having to wait for the next train. In this case, some of the feeder’s 
delay is knocked on to the connecting train, and all passengers already in the 
connecting train also face a delay. Moreover, these passengers may miss their 
connection if they wish to transfer in a subsequent station. In general, it might 
thus be beneficial to propagate the delays in the network, and one should decide 
on a set of waiting trains. Even today, such waiting policy decisions are usually 
taken by a human dispatcher. Ideally, a modern railway decision support system 
should enable a dispatcher to easily evaluate the impact of his decisions, or even 
propose a good waiting policy. This naturally leads to the algorithmic problem 
of determining an optimal waiting policy that minimizes the overall passenger 
delay. 

Although delay management problems have been studied for some years, not 
much is known about the computational complexity of minimizing total pas- 
senger delay in general models. In this paper, we analyze restricted variants of 
the event-activity network model presented in [Nac98], which was analyzed in 
[Sch02]. In this graph representation, vertices represent arrival and departure 
events at stations. Directed edges represent driving and waiting activities for 
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trains, as well as transfer activities for passengers. Trips of passenger are mod- 
eled as paths in the network. Each path has a weight representing its importance. 
One of the models considered in [Sch02] is based on a bi-criterial objective func- 
tion. The goal is to simultaneously minimize the perturbation of the timetable 
and the total weight of paths that miss a connection. This version of the problem 
is known to be weakly AfP-hard [Sch02]. The total delay can be efficiently min- 
imized if the “never-meet-property” holds, which basically forbids two delayed 
vehicles meeting at a station. The general model can be solved to optimality 
through an ILP-based branch-and-bound algorithm [Sch01,Sch02]. Other the- 
oretical models consider on-line versions with unknown delays at one bus stop 
[APW02] or the influence of buffer times for delays with exponential distribution 
for one transfer [Gov98,Gov99]. A fair amount of work was done using simula- 
tions [Man01,HHW99,OW99]. However, these last studies are less related to this 
paper. 

As the complexity of the general event-activity network is still unknown, 
we focus on a restricted setting, as formalized below. The basic element of our 
model is a direct link between two stations and each such link is operated by a 
single train. Further, we assume that the original timetable is tight, so there is 
no possibility for a train to catch up on a delay. In the same spirit we assume 
that all the transfers at one station happen instantaneously. Hence there is only 
one amount of delay necessary. Obviously this is a fairly strong restriction of 
the original model. Nevertheless, we are convinced that the key combinatorial 
structure remains. As soon as the decision on which passengers to drop are 
taken, it is easy to produce a modified timetable that minimizes the remaining 
delays. The general model as well as this model are easily solvable as soon as 
we know how many passengers use a certain transfer. The problem seems to be 
that dropping passengers somewhere has a significant effect on these numbers 
throughout the network. Our model singles out this particular phenomenon. As 
we cannot even analyze this restricted model in its entirety, it appears that we 
did not strip away all of the complication of the event-activity network. 

For simplicity, we include the externally caused delay of passengers in the 
objective function. Glearly, such delays cannot be optimized, and their contribu- 
tion in the objective function is known a priori. As long as we focus on optimal 
solutions, this offset has no impact on the complexity. Note, however, that this 
offset can make a considerable difference for approximation ratios and competi- 
tive analyses. 

We analyze three different cases of our model: (i) a single delayed train in the 
network with passenger following a predefined path; (ii) origin-destination paths 
for passenger flows which can have primary unit delays on a railway corridor; 
(iii) a single delayed train and origin-destination pairs for the passengers, who 
can adapt their route according to the delays in the network. The primary path 
delays in the second model may seem unusual, but they should be interpreted 
as passengers arriving at a station on a delayed train and wishing to transfer. 
Note that an instance of (ii) can be mapped to an instance of (i) by introducing 
some additional connections and infinite weight passenger paths. However, this 
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usually obscures the graph structure of the original problem. Hence, we analyze 
these two models separately. 

In this paper, we present the following results. We show that (i) can be solved 
efficiently if passengers transfer at most twice, or if the railway network has a tree 
structure. We further describe some extensions to more complicated models. For 
model (ii) we present a polynomial time algorithm for railway corridors. Finally, 
we show that problem (iii) is strongly AfP-hard. 

2 General Problem Statement 

This section describes the characteristics common to our three models. The 
specific characteristics of each model are described in the corresponding sections. 

Let G = (V,E) be a directed acyclic graph with the vertices in V are 
topologically sorted. Each vertex of the graph represents a station. An edge 
e = (u,v) € E represents a direct link from station u to station v operated by 
a single train. At every station v € V the outgoing edges {v,u) represent the 
possible connections for all incoming edges (w,v). Thus, a directed path within 
G corresponds to a journey a passenger can undertake by transferring to other 
trains at each intermediate station. 

We distinguish between externally caused primary delays, and secondary 
delays, which are introduced by the waiting policy. We study the case in which 
all primary delays are identical and of size S. All transfers at a station take 
place instantaneously, so the passengers of a delayed train miss their connection 
unless it waits for them. Observe that a connecting train must wait for the 
entire delay 6 of the delayed feeder train in order to maintain a connection. In 
this case, all transfers to the connecting train are maintained, and the entire 
delay 6 propagates. Additionally, we assume that a delayed train cannot catch 
up on its delay. 

All trains are operated according to a periodic timetable with period T, and 
we assume that delays do not propagate to the next period. So, if a person misses 
a train, she continues her journey with the next trains traveling along the same 
route. Since there are no further disturbances in the next period, her arrival is 
thus delayed by T time units. Our analyses below do not depend on the fact that 
all passengers face the same period T. Indeed, our results can be extended to 
models where passengers missing a connection incur a delay depending on their 
route, and on the station where they miss their connection. 

We consider passenger flows in the railway network as a set of paths V in G. 
For each passenger path P G V, we are given a source vertex s{P) and a target 
vertex t(P). For a pair of vertices u,v GV, we allow multiple paths in P. Each 
path P G P is defined as the ordered set of edges leading from s{P) to t{P). 
Further, each path P G P has a weight w{P), which represents the number of 
passengers, or the path’s importance in a more abstract sense. 

Our objective is to minimize the total weighted delay for all passenger paths 
in the network, given the primary delays 6. A path P G P contributes to the 
objective function as follows: 0 if P arrives as scheduled, 5 ■ w{P) if P arrives 
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with a delay of <5, and T ■ w{P) if P misses a connection. We refer to paths P 
with zero delay as on time paths, to those facing a delay of S ■ w{P) as delayed 
paths, and to paths facing a delay of T ■ w{P) as dropped paths. 

3 Minimum Cut Reductions for Single Primary Delays 

In this section, we consider the previously described model for the case in which 
there is a single train cq with primary delay. The passenger paths P are as- 
sumed as given. We stress that these paths are fixed, and will be followed by 
the passengers whatever delays will be introduced by the delay policy. We as- 
sume that the vertex u of the primary delayed edge cq = (u, v) has in-degree 
zero, as every instance can be transformed to an equivalent instance having this 
quality. Let Z\ C if , cq G be the set of delayed trains, fl = E \ A the on- 
time trains. We define the single source delay management problem: given an 
instance {G,eo,P,w,S,T), find a set A Q E of waiting trains minimizing the 
total weighted delay on the network. 

In the following, we show how we can transform this restricted problem to 
a minimum cut problem. We introduce the method for passenger paths with at 
most one transfer and then extend it to two transfers. Thus, the single source 
delay management problem can be efficiently solved if passengers switch trains 
at most twice, i.e. if they use at most three different trains. 

The key idea is to build a new graph N in which every s — t-cut [S', S] 
represents the delay policy Z\ = S, 17 = S. To do this, the trains (edges) of the 
original graph G are mapped to vertices in the graph N; forward and backward 
edges are added to N between vertices e and / if in the original graph passengers 
can connect from e to /. More edges are added, and the weights are defined such 
that the cost of a cut will correspond to the total delay occurring if trains in S 
are delayed and trains in S depart on time. 

Given the train network (G = {V, E),eo,P,w, S,T), the equivalent directed 
minimum s — t-cut network N = {H = {U,E),s,t,c), s,t G U, c : S' — >■ N 
is built as follows. Set U = E U {t}, s = eg; let E = FiU E 2 , where = 
{(e,/),(/,e)|e,/ G E,e = (u,v),f = (w,ic)} and F 2 = {(e,t)je G E}. Let 
P = {e, /},e, / G if be an arbitrary passenger path of length 2, from source 
station s{P) to target station t{P). The passengers in P change train exactly 
once. Let P{ei) be the set of paths using edge e^. We define the edge costs in N 
as (see Figure 1 for an example): 

f (T - <5) • rc(P) ifP = {e,/}GiP; 

c{eJ)^h-w{P) ifP = {/,e}GiP; 

(EpsP(e)^(-P)) '^eGUJ = t 

Let G' = {V , E') be a directed graph, c : if' >->■ N a weight function and let 
{s', <'} C V . A directed cut is a partition of V into two sets S' and S' = V'\S' , 
such that s' G S', t' G S' . Letting if( = |(e, f) G E' : e G S' , f G S'}, the cost of 
the cut is defined as C{S', S') — J2{e,f)eE^ c(e, /). 
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Fig. 1. The four possible cuts for two conuectiug trains with paths P\ = {e} aud 
Pi = {e,f} of weight w{Pi) and w{P 2 ), respectively. 



A 


n 


Total weighted delay 


{e,/} 

{e} 

{/} 

0 


0 

{/} 

{e} 

{e,f} 


5 ■ {w{Pi) + W{P 2 )) 

6 ■ w{Pi) -I- T ■ w{P 2 ) 

5 ■ W{P2) 

0 



Fig. 2. The four possible delay policies 
for two counecting trains with paths 
Pi = {e} and P 2 = {e, /}, with weight 
w(Pi) and w{P 2 ), respectively. 




Fig. 3. Reduction for a path P = 

{e,f,g} 



Lemma 1. The cost of a minimum directed s — t-cut [S', S] in N is equal to the 
minimum total weighted delay in G for the delay policy A = S, f 2 = S \ {t}, 
given that passengers connect at most once. 

The proof is by analysis of the cases shown in Figure 1 and Figure 2. Details can 
be found in [GGJ+04]. 

This approach can be extended to passenger paths changing trains twice. 
The key idea is to build an additional structure for each path P of length three. 
Basically, we use the same construction as in the previous reduction for account- 
ing the (T — 5) weight which delayed paths experience when they are dropped. 
For paths using three trains at most one such edge can traverse the cut. As for 
the 6 delay, we cannot maintain the previous reduction, as two edges weighted 
with (5 • w{P) could traverse the cut. This problem can be solved by introduc- 
ing an additional vertex for each path. The weights of the edges connecting it 
will be such that it will be in S as soon as one vertex of the path is in S. Fi- 
nally, we can add an edge connecting the vertex to t, with weight S ■ w(P): this 
edge will be in the cut as soon as P is delayed, providing the correct weight for 
being delayed. So, we extend the previous reduction for paths P composed by 
three edges, implying a double transfer, as follows (see Figure 3). For each such 
path P = {61,62,63}, we add a vertex vp. The vertices Ci € P are connected 
to Vp through edges (ei,vp) of weight c{ei,vp) = 00. Further, we add edges 
(ei, 62), (62, 63), w(ei,62) = 16(62,63) = {T - 5 ) ■ w{P), and an edge (vp,t) of 
weight w{vp, t) = S ■ w{P). 

Lemma 2. The cost of every directed s — t-cut [S', S] in N is equivalent to the 
total delay on the network G when applying the policy A = S~ , 17 = S~ , where 
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S~ = S \ {vp : P £ P,\P\ = 3}, S~ = S \ ({t} U {vp : P £ V,\P\ = 3}), given 
that passengers transfer at most twice. Hence, the minimum directed s — t-cut 
in N is equivalent to the minimum delay strategy on G, given that passengers 
transfer at most twice. 

Again, the proof is by case analysis, refer to [GGJ+04] for details. 

Theorem 1. The minimum delay policy for {G,eo,P,w,5,T) where paths do 
not transfer more than twice can he found efficiently by reduction to a polynomial 
size directed minimum s — t-cut problem. 

Proof. When building the graph N, we introduce a vertex for each train and for 
each path of length three, i.e. 0(|if| + \P\) vertices. Similarly, we introduce one 
edge (e,t) for each train and 0(1) edges for each path, i.e. 0{\E\ + \P\) edges in 
total. A minimum directed cut on a graph with n vertices and m edges can be 
found in O(nmlogn) time[ST83]. Hence, we need 0{{\E\ + log(|A | + |P|)) 

to find a minimum delay policy. 

4 Minimum Cut Reductions for Special Cases 

Next, we show how the min-cut approach can be extended to tree-like networks 
with arbitrary paths as well as to trains with intermediate stops with passen- 
gers transferring at most once. The proposed reduction can be adapted straight 
forwardly to paths with arbitrary length, given that the graph G is an out-tree. 
[Sch02] showed that the “never-meet-property” applies to out trees, hence the 
delay management problem can be solved efficiently. Our reduction is another 
method to solve this special network structure. 

The minimum cut approach can also be extended to train networks where 
trains have intermediate stops, given that the passengers transfer at most once. 
We assume the passenger paths are given as in single source delay management 
problem. The set TZ of trains with intermediate stops are represented as non- 
empty, edge disjoint paths, R = {/i, . ■ . , /r}, /i £ E,1 < i < r. All edges in 
the graph are direct train connections. By restricting |i?| = l,Vi? £ TZ, we get 
the single source delay management problem. Since we assume tight timetables, 
all subsequent edges in a train’s path are delayed as soon as some edge delays. 
Similarly to the previous reductions, we map the edges E to vertices U, add a 
vertex t to U and set s = eg. For each train R = {fi, f 2 , ■ ■ ■ , fr} we introduce 
edges {fi, fi+i), 1 < i < r with weight w{fi, fi+i) = oo. As soon as fi £ S, such 
edges prevent the vertices fk,i < k < r from being in S, providing the required 
consistency that delayed trains cannot catch up their delay. Hence, each train in 
TZ can become delayed, but it will never be on time again. 

The edges and their weights are defined as follows. For all the paths P = 
{fi, ■ . . , fi}, fi £ R using only train R, and which hence do not connect, we 
introduce the edges (/i, t), (/;, /i) with weights w{fi,t) = w{fi,fi) = 6 -w{P). 
For the paths P = {/i, ...,//, gi, ..., g™}, /* £ R\,gi € i? 2 , which use two trains 
and thus need to connect, we must extend the single train reduction above. As 
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before, we introduce (/i, t), (/;, /i), = w{fi,fi) = 6-w{P). These edges 

account for the delay that can be accumulated on the first train. Additionally, 
we introduce (fi,gi), w{fi,9i) = {T — 6) ■ w{P); such edges account for the 
additional dropping delay passengers experience if the first train is delayed and 
the second departs on time. Finally, ((/m, fi), w{9m, fi) = S- w{P) accounts for 
the delay passengers experience if they get delayed on the second train. 

Lemma 3. The cost of the minimum s — t-cut [S', S] on N equals the total 
weighted delay of the minimum delay waiting policy for {G,eo,P,TZ,w,S,T), 
given that passengers transfer at most once. 

For the proof and other details, we refer to [GGJ“*'04]. 

5 Multiple Path Delays on a Railway Corridor 

This section describes a model for multiple primary delays on a railway corri- 
dor. For this model, we present an enumeration tree, and show that equivalent 
subtrees of that enumeration tree can be pruned. This observation leads to a 
polynomial time algorithm for the minimum total weighted passenger delay ob- 
jective. Finally, we show that the pruned enumeration tree algorithm boils down 
to a dynamic programming algorithm. 



5.1 The Model 



The model in this section has two specific characteristics. First, we consider a 
corridor in the railway network, and deal with the primary delays that enter that 
corridor through delayed feeder trains. Since a corridor corresponds to a path in 
the railway network, this implies that G is a path. Second, each passenger path 
P GP has a primary delay S{P) G {0, 1}. Such a delayed passenger path should 
be interpreted as having arrived with some delayed feeder train. The primary 
path delay can take only two values, either one time unit or zero time units. 
However, our analysis below only uses the fact that all non-zero primary delays 
are identical. As before, this implies that a connecting train either departs as 
scheduled, or it waits for all delayed passengers. 

In this case, we denote the ordered vertex set by V = (vi,. . . ,Vm+i), with 

< ... < Vm+i, and the edge set by E = (ei,...,em), with a = (vi,Vt+i). 
The decision whether a connecting train waits for delayed passengers or not is 
modeled by the following wait-depart decision variable: 



1 if train e* delays its departure from station Vi by one time unit, 
0 if train departs from station Vi as scheduled. 



As before, the objective function of the model is to minimize the total 
weighted path arrival delay over all possible vectors (xi, . . . , Xm) G {0, 1}™. 
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5.2 Binary Enumeration Tree 

This section describes an m-level binary enumeration tree H for the values 
of the decision variables Xi. In H, branching between the levels i — I and i 
represents choosing a value for the variable Xi. A node at level i in H has a label 
(a;i, . . . , Xi), where xi, . . . ,Xi are the decisions taken on the unique path from 
the root node to that node. The root node itself has the empty label (). So, the 
label (xi, . . . ,Xi) immediately contains the partial solution at the node, and a 
leaf node (xi, . . . , x^) at level m represents a solution to the model. 

At node (xi,...,Xi), wait-depart decisions have been taken for the trains 
Cl, . . . , Ci. We keep track of the impact of these decisions through the follow- 
ing functions of the nodes (we omit the node label brackets here to improve 
readability): 

A(xi , . . . ,Xi) = |p G V I s(P) < Vi < t{P),P not dropped by xi, . . . , Xi| 
D{x\, . . . ,Xi) = ^ T ■ w{P) + 

P&V P^V:t(P)<Vi 

dropped by delayed by Xi,...,Xi 

Dm{xi,...,Xm)= D{xi,...,X^)+ ^ w{P) ■ Xm 

P£A{x\,...,Xm) 

The set of active passenger paths A{x\, . . . ,Xi) contains all paths P that are 
traveling in train e^, given the decisions at node (xi, . . . , Xj). In a sense comple- 
mentary, D{x \, . . . , Xi) contains the already accumulated weighted delay caused 
so far by the decisions at node (xi, . . . ,Xi). So, at any level i in H, each path 
P € V with s(P) < Vi is either contained in A(xi, . . . ,Xi), or its weighted ar- 
rival delay is accounted for in D{x\, . . . ,Xi). Note that A(xi, . . . ,Xj) contains 
paths P with t{P) = Ui+i, although the arrival delay for such paths is known 
when the decisions xi,...,x^ have been taken. In particular, this means that 
the active path set A(xi, . . . , Xm) at a leaf node may be non-empty. Therefore, 
Dm{xi , . . . , Xm) accounts for the weighted arrival delay at Vm+i of all passenger 
paths P on train Cm, given the decisions at node (xi, . . . , Xm)- An optimal solu- 
tion to our model is then represented by a leaf node (xi, . . . , Xm)* that attains 
a minimum value Dm(xi , . . . , x^)- 

For the root node (), we set D{) = 0, A() = 0. Below, we specify the initial- 
ization for the child nodes (0) and (1) of the root node (). Next, we describe a 
general child node (xi, . . . , Xj, Xj+i) with parent node (xi, . . . , Xi). Since the val- 
ues of A(xi, . . . , Xi+i) and D{x \, . . . , x^+i) depend on the parent’s values and on 
the values of the decisions Xj+i and x^, we distinguish between the four possible 
combinations for a child node (. . . , Xj, Xj+i). 

Initialization of (0) and (1) 

yl(0) = {P G P|s(P) = Vi,5{P) = 0} 

P(0)= ^ T-w{P) 

PeP: 

s(P)=vi,S{P) = l 



yl(l) = {P G P|s(P) = xi} 
P( 1)=0 
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(...,0,0) nodes 

A{xi, Xi+i) = A{xi, . . . ,Xi)\{P e P\t{P) = Vi+i} 

\J{P &V\s{P)=Vi+u5{P) = Q} 

D{xi,...,Xi+i) = D{xi,...,Xi)+ ^ T-w{P) 

P^V: 

s{P)=Vi+t,S{P) = l 

(. . . , 0, 1) nodes 

A{xi , . . . , Xj+i) = A{xi, ...,Xi)\{P€ P\t{P) = t;*+i} U {F e P\s{P) = v^+i} 

D{xi, . . . , Xi+i) = D{xi, 

(...,1,0) nodes 

A{xi , . . . ,x,+i) = {P€ P\s{P) = t-i+i, (5(F) = 0} 

D[xi, . . . , Xi-^-i) = D[xi, . . . , Xi) 'y ( P ■ w{P) y ( F • ui(F) + 

i(P)>«i+i s(P)=j;i+i,(5(P) = l t{P)=Vi^-^ 

(...,1,1) nodes 

A{xi , . . . , x^+l) = A{xi, ...,Xi)\{P€ P\t{P) = Pj+i} U {F e F|s(F) = v^+{\ 
D{xi,. . . ,Xi+i) = D{xi,. . . ,Xi) + ^ w{P) 

P^A{xi,...,Xi): 

t(P)=Vi+i 

As an example, and since it is a special case, we briefly discuss the case of 
a (...,1,0) node. Because Xi = \ and Xi+\ = 0, all paths in A{x \, . . . , Xi) not 
ending in Pj+i cannot transfer to train e^+i and are dropped, facing a weighted 
arrival delay of T ■ w{P). Train e^+i is also missed by all paths F G F starting 
at Pi+i with primary delay <5(F) = 1, so these paths also face a weighted arrival 
delay of T ■ w{P). Further, since Xi = 1, all paths F G A(a;i , . . . ,Xi) ending at 
Vi-f-i arrive with a weighted arrival delay of w(P). Finally, the only paths F that 
do depart with train e^+i are those starting at Vi+i and having S(P) = 0. 

5.3 Pruning Equivalent Subtrees 

A (. . . , 1, 0) enumeration tree node implies that all passengers wanting to transfer 
from train Ci to train e^+i will miss their connection, so none of the paths in 
A{x \, . . . , Xi) enters A{x\, . . . ,Xi, Xi+\). Therefore, the subtree rooted at node 
{x\, . . . , Xi+i) is in a sense independent of the decisions x\, . . . ,Xi taken before. 
The Lemmas below show that this independence allows to prune the enumeration 
tree significantly. 

Lemma 4. For an enumeration subtree rooted at node {x\, . . . ,Xi+\), let 



P ^i+k) • — F(xi , . . . , Xi-\-k) F(xi, . . . ,Xj) 
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be the accumulated weighted delay in the subtree node (xj+i, . . . ,Xi+k)- Any two 
enumeration subtrees rooted at {x\, . . . , Xi,Xi+\) and {x'l, . . . ,x[, with Xi = 
x'^ = 1 and Xi+i = = 0 are equivalent in the sense that 

^{x-i,...,Xi) (^i+ 1 5 ■ • ■ 5 ^i+k) ^{x'^^...,x^^)i.^i+l^ • • • ^ 1, ... 777^ i. 



Proof. We use the shorthand notation (...,1,0) for some node with a label 
{xi, . . . , Xi-i, 1, 0). From the construction of (. . . , 1, 0) nodes, it is clear that 

A(. . . , 1, 0) = {P G V\s{P) = v,+i,S{P) = 0}. 

So, A{. . . , 1, 0) is the same for each node (. . . , 1, 0). Therefore, both the enumer- 
ation subtrees rooted at {xi, . . . , x^+i) and {x{, . . . , x[j^f) start with the same 
path set A(...,1,0), and will thus have accumulate the same weighted delay 
until subtree node (a^i+i, . . . , Xi+k)- 



Lemma 5. Of all subtrees rooted at {x\, . . . ,Xi+\) , with Xi = = 0, it 

suffices to explore the single subtree with root 



{xi , . . . , 



argmin 

{xi,...,Xi+i) 



^D{xi,. 



Ci+1, 



= 1 , 



^i+1 



= 0 



Proof. Lemma 4, with k = m — i, implies that the minimum of D{x \, . . . , Xi+\, 
Xi+2i ■ • ■ ) Xm) and D{x[, . . . , x^+ 2 , . . . , Xm) is determined by the minimum 

value of D{xi, . . . ,Xi+i) and D{x[, . . . ,x^_^_l). 



5.4 Analysis of the Algorithm 

Because of the pruning described in Lemma 5, the number of nodes in the 
enumeration tree can be reduced significantly from 0(2’”) to Ofmf), as is stated 
by the following Lemma. Moreover, this leads to an overall worst case running 
time of 0{mf‘) for the pruned enumeration tree algorithm. 

Lemma 6. The pruned enumeration tree has 0{mf) nodes. 

Proof. In order to count the number of nodes in the enumeration tree, we define 
the following variables: 

N^{i) The number of nodes {xi, . . . , Xi-i,0) at level i. 

A^(i) The number of nodes {xi, . . . , Xi-i, 1) at level i. 

From the definition of the initialization phase, it follows that A°(l) = A^(l) = 1. 
At level 7 -I- 1, a child node (■ . ■ , 1) is created for every parent node (. . . , 0), 
and also for every parent node (. . . , 1). Therefore, N^{i -|- 1) = A°(i) -|- N^{i). 
Further, for every parent node (. . . , 0), a child node (. . . , 0) is created. But, by 
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Lemma 5, one single child node (. . . , 0) is created for all parent nodes (. . . , 1). 
This yields + 1) = iV°(z) + 1. Solving the recurrences 

iV°(l) = l N\l) = l 

N°{i + 1) = N°{i) + 1 N^{i + 1) = N°{i) + N^{i) 



gives N^{i) = i and iV^(z) = ^z^ — ^z + 1. So, the total number of nodes at 
level z is 0{rn?). With m levels, the pruned enumeration tree has 0{rrt’) nodes 
in total. 



Theorem 2. The railway delay management problem on a corridor with multi- 
ple {0, 1} primary passenger path delays can be solved in 0{m^) time. 

Proof. The pruned enumeration tree has 0{m^) nodes, and at each node at most 
\V\ = 0{mf) paths have to be evaluated to compute the functions A(.) and D{.). 

5.5 A Dynamic Programming View 

This section shows that the pruned enumeration tree algorithm can also be 
written as a dynamic program. We use the function zio(z, fc) that denotes the 
minimum value of all partial solutions where anything can have happened until 
train Ck- 2 , train Ck-i departs delayed, and all subsequent trains e^, . . . , cz depart 
as scheduled. Similarly, the function zim{i,j,k) denotes the minimum value of 
all partial solutions where anything can have happened until train ek- 2 , train 
Ck-i departs delayed, the next trains e^, . . . , depart as scheduled, and all 
subsequent trains ej,...,ei depart delayed again. The subscripts for the func- 
tions zio and Zioi stand for the structure of the last significant events in the 
function’s partial solutions. 

The recursion formulas for zio{i,k) and Zioi{i,j,k) correspond uniquely to 
the four cases for the updates of the functions D{x\, . . . ,Xz+i) for the enumer- 
ation tree node {x\, . . . , Xi+\). In particular, the case i 1 = k for zio(z -f 1, k) 
corresponds to the pruning of the enumeration tree when only a single (...,1,0) 
node is created at level z-l- 1. Because of these clear one-to-one correspondences, 
we do not describe the recursion formulas here, but refer to [GGJ+04] for a 
formal description. 



6 Further Results 

A careful analysis of the algorithm in Section 5 shows that it can be applied to 
an out-tree graph G. The dynamic program discloses that it can also be carried 
out backwards. Hence, it follows that the same approach can also be applied to 
in-trees. Further, the dynamic programming algorithm can be extended for the 
case of K primary delay categories, that is, 5{P) G {5i, . . . , 5k}- However, this 
extension comes at the cost of a worst case running time of 
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To our knowledge, no hardness result is yet known for the non-constrained 
single source delay management problem. Instead, we consider a model where 
passengers choose their route dynamically, according to the connections causing 
them the least delay. For this model, we have shown that: 

Theorem 3. The decision version of the delay management problem where pas- 
senger choose their route dynamically is MV-complete. 

The reduction is from 3-SAT, and is omitted due to lack of space. It uses promi- 
nently that a route can and has to choose between alternatives, which makes it 
easy to ensure that only one of the literals is set to true, and also that every 
clause has at least one true literal. Refer to [GGJ+04] for details. 

7 Future Research 

This paper discussed the algorithmic complexity of three variants of the event- 
activity model for railway delay management. On the other hand, delay man- 
agement is also an on-line problem. In such a setting, not the entire structure of 
the primary delays is known to the algorithm. On-line approaches are probably 
of particular interest to practical users. We have some preliminary results for 
on-line models on a path. Extensions to general graphs with multiple delays are 
of great interest, and decision policies for such models could be useful in real-life 
as well. 
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Abstract. We introduce a framework for reducing the number of com- 
parisons performed in the deletion and minimum deletion operations for 
priority queues. In particular, we give a priority queue with constant cost 
per insertion and minimum finding, and logarithmic cost with at most 
logn-h O (log log n)^ comparisons per deletion and minimum deletion, 
improving over the bound of 21ogn -1-0(1) comparisons for the bino- 
mial queues and the pairing heaps. We also give a priority queue that 
supports, in addition to the above operations, the decrease-key opera- 
tion. This latter priority queue achieves, in the amortized sense, constant 
cost per insertion, minimum finding and decrease-key operations, and 
logarithmic cost with at most 1.44 log n -|- O(loglogn) comparisons per 
deletion and minimum deletion. 



1 Introduction 

One of the major research issues in the field of theoretical Computer Science 
is the comparison complexity of comparison-based problems. In this paper, we 
consider the priority queue structures that have constant insertion cost, with an 
attempt to reduce the number of comparisons involved in the delete-min oper- 
ation. Binary heaps [22] are therefore excluded, following the fact that log log n 
comparisons are necessary and sufficient per insertion [13]. Gonnet and Munro 
[13] (corrected by Carlsson [5]) also showed that logn-hlog* n+0{l) comparisons 
are necessary and sufficient for deleting the minimum of a binary heap. 

Several priority queues that achieve constant insertion cost, and logarithmic 
cost per delete and delete-min appeared in the literature. Examples of such heap 
structures that achieve these bounds in the amortized sense [19] are the binomial 
queues [2,20] and the pairing heaps [10,14]. The same bounds can be achieved in 
the worst case with a special implementation of the binomial queues. If we allow 
the decrease-key operation, the Fibonacci heaps [11] and the thin heaps [15] 
achieve, in the amortized sense, constant cost per insert, find-min and decrease- 
key, and logarithmic cost per delete and delete-min. Other heap structures that 
achieve such bounds in the worst case are the relaxed heaps [7], and the priority 
queues in [3,4,15]. 

^ logs; equals max(log 2 x, 1). 
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Among the priority queues that achieve constant insertion cost, the number 
of comparisons performed in the delete-min operation of the binomial queues 
and the pairing heaps is bounded by 21ogn + 0(1)- The multiplicative constant 
involved in the logarithmic factor is more than 2 for the other priority queues 
mentioned above. 

Since our new heap structures use binomial queues as a building block, we 
review the operations of the binomial queues in the next section. In Section 3, we 
give a structure that achieves, in the amortized sense, constant cost per insert 
and find-min, and logarithmic cost with at most logn + 0(loglogn) comparisons 
per delete and delete-min. In Section 4, we modify our structure to achieve the 
same bounds in the worst case. In Section 5, we give a priority queue that sup- 
ports, in addition to the above operations, the decrease-key operation. This latter 
priority queue achieves, in the amortized sense, constant cost per insert, find- 
min and decrease- key, and logarithmic cost with at most 1.44 log n-l- O (log log n) 
comparisons per delete and delete-min. As an application of our layered heap 
structures, we show in Section 6 that using our first priority queue in the adaptive 
Heap-sort algorithm in [17] achieves a bound of at most nlog ^ -I- 0(nloglog 
comparisons, where / is the number of inversions in the input sequence. This 
result matches the bound of the heap-based adaptive sorting algorithm in [9] . 

The question of whether there exists a priority queue that achieves the above 
bounds and at most logn -I- 0(1) comparisons per delete-min is still open. A 
similar question with respect to the comparisons required by the dictionary 
operations was answered with the affirmative to be logn -1-0(1) by Andersson 
and Lai [1]. However, the trees of Andersson and Lai achieve the bound of 
logn -|- 0(1) comparisons only in the amortized sense. The existence of such a 
worst-case bound is another open problem. 

2 Binomial Queues 

A binomial tree [2,20] of rank r is constructed recursively by making the root of 
a binomial tree of rank r — 1 the leftmost child of the root of another binomial 
tree of rank r — 1. A binomial tree of rank 0 is a single node. The following 
properties follow from the definition: 

— The rank of an n-node (assume n is a power of 2) binomial tree is logn. 

— The root of a binomial tree of rank r has r sub-trees each of which is a 

binomial tree, having respective ranks 0, 1, . . . , r — 1 from right to left. 

To represent a set of n elements, where n is not necessarily a power of 2, we 
use a forest having a tree of rank i if the z-th position of the binary representation 
of n is a 1-bit. A binomial queue is such a forest with the additional constraint 
that the value of every node is smaller than or equal to the values of its children. 

Each binomial tree within a binomial queue is implemented using the binary 
representation. In such an implementation, every node has two pointers, one 
pointing to its left sibling and the other to its leftmost child. The sibling pointer 
of the leftmost child points to the rightmost child forming a circular list. Given 
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a pointer to a node, both its rightmost and leftmost children can be accessed 
in constant time. The list of its children can be sequentially accessed from right 
to left. To support the delete operation, each node will, in addition, have a 
pointer to its parent. The roots of the binomial trees within a binomial queue 
are organized in a linked list, which is referred to as the root-list. 

Two binomial trees of the same rank can be merged in constant time, by 
making the root of the tree that has the larger value the leftmost child of the 
other root. The following operations are defined on binomial queues: 

Insert. The new element is added to the forest as a tree of rank 0, and successive 
merges are performed until there are no two trees of the same rank. (This is 
equivalent to adding 1 to a number in the binary representation.) 

Delete-min. The root with the smallest element is removed, thus leaving all the 
sub-trees of that element as independent trees. Trees of equal ranks are then 
merged until no two trees of the same rank remain. The new minimum among 
the current roots of the trees is then found and maintained. 

Delete. The key of the node to be deleted is repeatedly swapped with its par- 
ents up to the root of its tree. A delete-min is then performed to delete this node. 

For an n-node binomial queue, the worst-case cost per insert, delete-min 
and delete is O(logn). The amortized bound on the number of comparisons per 
insert is 2, and per delete-min is 21ogn. To see that this bound is tight, consider 
a binomial queue with n one less than a power of 2. Consider an alternating 
sequence of a delete-min that is followed by an insert, such that the minimum 
element is always the root of the tree with the largest rank. For every delete-min 
in such a case, we need [log nj comparisons for merging trees with equal ranks 
and [log nJ comparisons to find the new minimum. 

3 A Structure with the Claimed Amortized Bounds 

For the binomial queues, there are two major procedures that contribute to the 
multiplicative factor of 2 in the bound on the number of comparisons for the 
delete-min operation. The first is merging the trees with equal ranks, and the 
second is maintaining the new minimum element. 

The basic idea of our heap structure is to reduce the number of comparisons 
involved in finding the new minimum, after the deletion of the current minimum, 
to 0(log log n) . This is achieved by implementing the original queue as a binomial 
queue, while having an upper layer forming another priority queue structure that 
only contains the elements of the roots of the binomial trees of the original queue. 
The minimum element of this upper layer is, therefore, the overall minimum 
element. The size of the upper layer is O(logn) and the delete-min requires 
O(loglogn) comparisons for this layer. The challenge is how to maintain the 
upper layer and how to efficiently implement the priority queue operations on 
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the lower layer (original queue) to reduce the work to be done at the upper layer, 
achieving the claimed bounds. If the delete-min operation is implemented the 
same way as that of the standard binomial queues, there would be a logarithmic 
number of new roots that need to be inserted at the upper layer. Hence, a new 
implementation of the delete-min operation, that does not alter the current roots 
of the trees, is introduced. Next, we show how different priority queue operations 
are implemented for both layers. 



The Lower Layer 

Given a pointer from the upper layer to the tree that has the root with the 
minimum value, the delete-min is implemented as follows. After the minimum 
node is removed, its sub-trees are successively merged from right to left (sub-trees 
with lower ranks first) forming a new tree with one less node. In other words, 
every sub-tree is merged with the tree resulting from the merging of the sub-trees 
to its right, in an incremental fashion. The merging is done similar to that of the 
binomial trees; the root of the tree that has the larger value becomes the leftmost 
child of the other root. We call this procedure incremental merging. Even though 
such a tree loses a node, it is still assigned the same rank. We maintain a counter 
with every tree indicating the number of nodes deleted from this tree. When a 
binomial tree of rank r loses 2’’“^ nodes (half its full nodes), the tree is rebuilt 
in linear time forming a binomial tree of rank r — 1. If there exists another tree 
of rank r — I in the queue, these two trees are merged forming a tree of rank 
r, and the counters are updated. When a node is deleted by a delete operation, 
its children are merged by an incremental merging procedure, as above. The 
counter associated with the tree that involved the deletion is decremented and, 
whenever necessary, the rebuilding takes place as in the case of the delete-min 
operation. On the other hand, the insert operation is implemented in the same 
way as that of the standard binomial queues. 

Lemma 1. The amortized number of comparisons performed in the delete and 
delete-min operations, at the lower layer, is bounded by logn -1-0(1). 

Proof. Since merging two binomial trees of rank i results in a binomial tree of 
rank z -I- 1, successively merging two binomial trees of rank 0 and then merging 
the resulting tree with another binomial tree of rank 1 and so on up to r — 1 
results in a binomial tree of rank r. Starting with a binomial tree of rank r and 
incrementally merging the children of the root is similar, except for the first 
merge of the two trees of rank 0. In such case, we may think about the resulting 
tree as a binomial tree of rank r that is missing one of its leaves. If we apply the 
same procedure again on the new tree, we may again think about the resulting 
tree as a binomial tree of rank r that is missing two of its nodes, and so on. It 
follows that the root of the resulting tree will have at most r children after each of 
these procedures. Hence, the number of comparisons required for an incremental 
merging of the children of this root is at most r — 1. Since the total number of 
nodes of such a tree is at least 2’’“^, the number of comparisons involved in an 
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incremental merging is at most log n. The cost of the rebuilding is amortized 
over the deleted elements, for a constant cost per element deletion. □ 



The Upper Layer 

The upper layer is implemented as a standard binomial queue that contains the 
roots of the trees of the lower layer. When the node with the minimum value 
is to be deleted or when a delete operation is performed on the root of a tree 
at the lower layer, this node is also deleted from the upper layer with an extra 
O (log log n) cost. The node that is promoted in place of the deleted node at the 
lower layer is inserted at the upper layer with constant amortized cost. When 
a new node is inserted at the lower layer, this is accompanied by a constant 
number of merges (in the amortized sense). As a result of the merge operations, 
some of the roots of the trees at the lower layer are linked to other roots, and 
should be deleted from the upper layer. We can only afford to spend a constant 
time with each of these deletions. Hence, a method of lazy deletions is applied, 
where a node to-be-deleted is only marked until we have time for bulk deletions. 
If any of these marked nodes is again promoted as a root at the lower layer 
(as a result of a delete or delete-min), its mark at the upper layer is removed. 
When the number of marked nodes reaches a constant factor of the nodes of the 
upper layer (say half), the upper layer is rebuilt in linear time getting rid of the 
marked nodes. The cost of the rebuilding is amortized over the merges that took 
place at the lower layer, for a constant cost per merge. What makes the scheme 
of lazy deletions work is the fact that none of the marked nodes could possibly 
become the minimum node of the upper layer, since the upper layer must have 
an unmarked node smaller than each marked node. 

Theorem 1. Our heap structure achieves, in the amortized sense, constant cost 
per insert and find-min, and logarithmic cost with at most logn -|- O(loglogn) 
comparisons per delete and delete-min. 

The bound on the number of comparisons for the delete and delete-min op- 
erations can be further reduced as follows. Instead of having two layers we may 
have several layers. The delete-min operation on the second layer is implemented 
the same way as that of the first layer. In each layer, other than the highest layer, 
a pointer for the minimum element at that layer is maintained from the next 
higher layer. Except for the highest layer, we need a constant of one for the 
logarithmic factor as a bound for the number of comparisons performed by the 
delete-min applied at this layer. Therefore, the bound on the delete and delete- 
min operations is at most log n -I- log log n-\- 2 log^^^ n -I- 0(1) comparisons, 
where log^^^ is the logarithm taken k times and A: is a constant representing 
the number of layers. An insertion of a new element would result in a constant 
amortized number of insertions and markings per layer. The number of layers k 
should be constant to achieve the constant amortized insertion cost. 
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4 A Structure with the Claimed Worst-Case Bounds 

When the delete and delete-min operations are performed at the lower layer of 
our structure, the binomial trees lose nodes and their structure deviates from 
the initial binomial tree structure. Our amortized solution was to allow any tree 
to lose half its descendants and then rebuild the tree as a binomial tree. To 
achieve the claimed worst-case bound, the binomial trees must lose nodes in a 
more uniform way that maintains their structural properties. 



The Reservoir 

The basic idea is to keep one tree in the reservoir and treat it in a special way. 
Whenever a node is deleted as a result of a delete or a delete-min operation, 
we borrow a node from the tree in the reservoir. Using this borrowed node, the 
sub-tree that lost its root is readjusted as a binomial tree of the same structure 
as before the deletion. The details follow. 

To borrow a node from the tree of the reservoir, we detach the rightmost child 
of its root, making the children of the detached node the rightmost children of 
the root in the same order. Note that this can be done with constant cost and 
involves no comparisons. If there is only one node left in the reservoir, we borrow 
that node, mark its corresponding node at the upper layer, and move a binomial 
tree from the lower layer to the reservoir. A crucial constraint is that the rank 
of the tree being moved to the reservoir is not the largest among the trees of the 
lower layer. A special case is when there is only one tree at the lower layer. In 
such case, we split that tree by cutting the leftmost sub-tree of its root, move 
this sub-tree to the reservoir, and insert its root at the upper layer. 

Whenever a root of a binomial tree at the lower layer is deleted as a result 
of a delete-min operation, the node that is borrowed from the reservoir is incre- 
mentally merged with the sub-trees of the deleted root from right to left. This 
results in a binomial tree with the same structure as before the deletion, and 
requires at most logn comparisons. At the upper layer, the new root of this tree 
is inserted and the corresponding node to the old root is deleted. 

When a node is deleted by a delete operation, the node that is borrowed from 
the reservoir is incrementally merged with the sub-trees of the deleted node from 
right to left. The key of the root of the resulting sub-tree is repeatedly compared 
with its parents and swapped if necessary. The number of comparisons involved 
in this procedure is at most logn. If the value of the root of the tree that 
involves the deletion changes, the new root is inserted at the upper layer and 
the corresponding node to the old root is deleted. 

If the root of the tree of the reservoir is to be deleted or if it has the minimum 
value in a delete-min operation, this root is removed and an incremental merging 
procedure is performed on its children from right to left. Starting with a binomial 
tree of rank r in the reservoir, the only guarantee is that the number of children 
of the root of this tree at any moment is at most r. This follows similar to the 
proof of Lemma 1 . Since during the lifespan of the current tree of the reservoir 
there is another binomial tree at the lower layer whose rank is at least r, the 
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number of comparisons involved in this procedure is, again, at most logn. At 
the upper layer, the new root of the reservoir is inserted and the corresponding 
node to the old root is deleted. 

Insert. Similar to the binomial queues, the insert is performed by adding a new 
node of rank 0 to the lower layer. If there are two binomial trees of the same rank 
r in the queue, the two trees are to be merged forming a binomial tree of rank 
r + 1. We cannot afford to perform all the necessary merges at once. Instead, 
we do a constant number of merges with each insertion. Hence, merges will be 
left partially completed to pick up the work on the next operation. To facilitate 
this, we maintain a logarithmic number of pointers to the merges in progress 
and their structures, kept as a stack of pointers. With every insertion we make 
progress on the merge of the two smallest trees with the same rank. Similar 
to our amortized solution, we need to do the insertion and the lazy deletion 
(marking) at the upper layer. From the amortized analysis of binomial queues 
it is clear that we must do at least two comparisons per insertion, performing 
the merges. See [6] for a similar treatment. A nice result of [6] implies that the 
number of pointers can be reduced to log* n if three units of work are done with 
every insertion, instead of two. 



Global Rebuilding of the Upper Layer 

To achieve the required worst-case bounds, we cannot afford to spend a linear 
time rebuilding the upper layer when the number of marked nodes reaches a 
constant factor of the total number of nodes. Instead, we use a technique similar 
to the global rebuilding technique in [18]. When the number of unmarked nodes 
goes below where m is the number of nodes at the upper layer and c > 2 
is some constant, we start rebuilding the whole layer. The work is distributed 
over the next operations. We still use and update our original upper layer, but 
in parallel we also build a new heap structure. If a node to-be-marked also exists 
in the new structure it has to be marked there, as well. Whenever a new node 
is inserted in the current upper layer we insert it in the new structure, as well. 
Whenever we mark a node for deletion or insert a new node in the current upper 
layer, we copy two of the unmarked nodes from the current structure to the 
new structure. It follows that within the next at most ^ operations, all the 
unmarked nodes must have been copied to the new structure. At this point, we 
can dismiss the current structure and use the new one instead. At this point, 
the new structure will have at least half the nodes unmarked. Since c > 2, we 
are only busy constructing at most one new structure at a time. The overall 
worst-case cost for an insertion is bounded by a constant. 

Theorem 2. Our modified heap structure achieves constant cost per insert and 
find-min, and logarithmic cost with at most log n -I- 0( log logn) comparisons per 
delete and delete-min. 
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5 A Structure Supporting the Decrease-Key Operation 

We introduce the notion of an F-queue that is used as our basic structure. 



F-Queues 

An F-tree is recursively defined as follows. An F-tree of rank 0 is a single node. 
An F-tree of rank r consists of a root node that has r or r — 1 sub-trees. These 
sub-trees, from right to left, are F-trees with consecutive ranks 0, 1, . . . , r — 1 or 
0, 1, . . . , r — 2. We call an F-tree whose rank is r a heavy F-tree if its root has 
r sub-trees, and light if its root has r — 1 sub-trees. Each node of an F-queue is 
implemented with three pointers that points to its left sibling, right sibling and 
leftmost child. The left pointer of the leftmost child points to the parent instead 
of its nonexistent left sibling. 

Lemma 2. The number of descendants of an F-tree of rank r is at least 
where <P = is the golden ratio. 

Proof. Let be the size of an F-tree of rank r. It follows from the definitions 
that To = IjTi > 1, and Tj. > 1 X)i=o for r > 2. Consider the Fibonacci 
numbers defined as Fg = Fi = 1, and Fr = F^-i -\- Fr -2 for r > 2. It follows 
by induction that > Fr for all r. The inequality Fr > is well known. It 
follows that r < 1.44 log T^. □ 

An F-queue is a forest of F-trees with the additional constraint that the 
value of every node is smaller than or equal to the value of its children. The 
main F-trees are all heavy, a condition that is not necessarily true for sub-trees. 
Note the similarity between the definition of the F-queues and the thin heaps 
[15]. The following operations are defined on F-trees: 

Split. A heavy F-tree of rank r can be split into two F-trees by cutting the 
leftmost sub-tree of the root from the rest of the tree. This leftmost sub-tree will 
form an F-tree of rank r — 1 (heavy or light), while the rest of the tree forms a 
light F-tree of rank r. No comparisons are performed in this operation. 

Merge. Two F-trees of the same rank r can be merged, in constant time and one 
comparison, resulting in an F-tree of rank r -\- 1. Let x be the root of the tree 
that has the larger value. 

1. If the two trees are heavy: Link x as the leftmost child of the other root, 
forming a heavy F-tree. 

2. If the two trees are light: Decrement the rank of the tree of a; to r — 1, 
converting it to a heavy F-tree. Link x as the leftmost child of the other 
root, forming a light F-tree. 

3. If one tree is light and the other is heavy: 
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a) If the tree of x is the light tree: Link x as the leftmost child of the other 
root, forming a heavy F-tree. 

b) If the tree of x is the heavy tree: Split the tree of x to two F-trees of 
ranks r — 1 and r, as above. Link these two trees, the one of rank r — 1 
first, as the leftmost children of the other root, forming a heavy F-tree. 

Delete-min. Given a pointer from the upper layer to the tree that has the root 
with the minimum value, the delete-min is implemented as follows. After this 
minimum node is removed, we perform an incremental merging procedure on 
its children, from right to left. We may think about the single node that is the 
rightmost child of this root as a light tree of rank 1 . It follows that the resulting 
tree will still be an F-tree of the same rank. If the resulting F-tree is light, make 
it heavy by decrementing its rank. At the upper layer, the new root of this 
tree is inserted and the corresponding node to the old root is deleted. We then 
proceed by merging any two F-trees that have the same rank, until there are no 
two trees of the same rank exist. This is done in a similar way to that of the 
Fibonacci heaps and the thin heaps. With each of these merges, a corresponding 
node at the upper layer is marked for lazy deletion. The amortized number of 
comparisons performed by this operation is at most 1.44 log n -I- O (log log n). 

Decrease-key. Let x be the node whose value is decreased. If a; is a root of a tree 
at the lower layer, a decrease-key operation is performed on the corresponding 
node at the upper layer and the procedure terminates. Otherwise, the sub-tree of 
X is cut and made a new tree. If this tree is light, make it heavy by decrementing 
its rank. The node x is then inserted at the upper layer. Consider the position 
of X before the cut, and let y be its parent. The left siblings of x are traversed 
from right to left, until a heavy sub-tree is encountered (if at all such a sub- 
tree exists). The rank of the root of each of the traversed light sub-trees is 
decremented, making these sub-trees heavy. If we encountered a heavy sub-tree, 
it is split into two sub-trees as mentioned above, and the procedure terminates. 
If all the left siblings of x were light and y was heavy, the procedure terminates 
and y becomes light. If the left siblings of x were light and y was also light, the 
sub-tree of y is cut and made a new tree. The rank of y is adjusted, making 
its tree heavy. The node y is then inserted at the upper layer. The procedure 
continues considering the left siblings of y, and the process is repeated until either 
no structural problem exists, or until we reach a root of a tree. The amortized 
cost of this operation is constant [15]. 

Delete. To delete a node, its value is decreased to become the smallest among 
the nodes of the heap. It is then deleted by applying a delete-min operation. 

Theorem 3. Our modified heap structure achieves, in the amortized sense, con- 
stant cost per insert, find-min and decrease-key, and logarithmic cost with at most 
1.44 log n -I- O(loglogn) comparisons per delete and delete-min. 
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6 Application — Adaptive Heap-Sort 

An adaptive sorting algorithm is a sorting algorithm that benefits from the 
presortedness in the input sequence and sorts faster. There are plenty of adaptive 
sorting algorithms in the literature, and many measures defining how the input 
sequence is presorted. One of the simplest adaptive sorting algorithms introduced 
in the literature is the adaptive Heap-sort algorithm [17]. The first step of the 
algorithm is to build a Cartesian tree from the input sequence. 

Given a sequence X =< xi,...,Xn >, the corresponding Cartesian tree 
[21] is a binary tree with root Xi = min(a;i, . . . , cc„). Its left sub-tree is the 
Cartesian tree for < x\, . . . ,Xi-i > and its right sub-tree is the Cartesian tree 
for < cci+i , . . . ,Xn >■ A Cartesian tree can be constructed in linear time [12]. 

After building a Cartesian tree, the adaptive Heap-sort proceeds by inserting 
the smallest element of the Cartesian tree in a heap. During the iterative step, the 
minimum of the heap is deleted and printed, and the children of this element 
in the Cartesian tree are inserted in the heap. The total work done by the 
algorithm is, therefore, n insertions and n minimum deletions plus the linear work 
involving building and querying the Cartesian tree. Levcopoulos and Petersson 
[17] showed that the number of elements of the heap at step i is not greater than 
|^ ||Crog^s(a;i)|| j 2 ^ -v^rhere Xi is the smallest element of the heap at step i and 

Cross{xi) = {jjl < j <n and min{xj,Xj+i) < Xi < max(xj, Xj+i)}. 

They [17] suggested using a binary heap, and showed that their algorithm 
runs in 0{n log where 



Osc{X) = jjC'ross(a;i)jj. 

i=l 

They also showed that, using a binary heap, the number of comparisons 
performed by the n insertions and the n minimum deletions is at most 2.5n log n. 
Using the fact that Osc{X) < 4- Inv{X) [17], it follows that adaptive Heap-sort 
runs in 0{n log where Inv{X) is the number of inversions in X. 

Inv{X) = ll{(i,j) 1 I < i < j < n and Xi > a;^}]]. 

Using our layered heaps instead of the binary heap, we achieve a bound of 
^”^1 log ||C'ross(xi)ll -I- loglog l]C'ross(a;i)ll) comparisons which is at 

most nlog -g 0(n loglog and hence a bound of nlog -g 

0(nloglog ) comparisons, which is optimal up to the lower order terms. 

This bound matches the bound of the heap-based adaptive sorting algorithm in 
[9], achieved using different ideas. In spite of the existence of adaptive sorting 
algorithms that achieve a bound of nlog -g 0{n) comparisons [8], these 

algorithms are either based on Insertion-sort or Merge-sort. The problem of 
achieving a bound of nlog -g 0{n) comparisons by a heap-based adaptive 

sorting algorithm is still open. 
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Abstract. We show that any priority queue data structure that sup- 
ports insert, delete, and find-min operations in pq{n) time, when there 
are up to n elements in the priority queue, can be converted into a priority 
queue data structure that also supports meld operations at essentially no 
extra cost, at least in the amortized sense. More specifically, the new data 
structure supports insert, meld and find-min operations in 0(1) amor- 
tized time, and delete operations in 0{pq{n) a{n,n/pq{n))) amortized 
time, where a{m, n) is a functional inverse of the Ackermann function. 
For all conceivable values of pq{n), the term a{n,n/pq{n)) is constant. 
This holds, for example, if pq{n) = L?(log* n). In such cases, adding the 
meld operation does not increase the amortized asymptotic cost of the 
priority queue operations. The result is obtained by an improved analy- 
sis of a construction suggested recently by three of the authors in [14]. 
The construction places a non-meldable priority queue at each node of a 
union-find data structure. We also show that when all keys are integers 
in [1, A], we can replace n in all the bounds stated above by N. 



1 Introduction 

Priority queues are basic data structures used by many algorithms. The most 
basic operations, supported by all priority queues, are insert, which inserts an 
element with an associated key into the priority queue, and extract-min, which 
returns the element with the smallest key currently in the queue, and deletes it. 
These two operations can be used, for example, to sort n elements by performing 
n insert operations followed by n extract-min operations. Most priority queues 
also support a delete operation, that deletes a given element from the queue, and 
find-min, which finds, but does not delete, an element with minimum key. 

Using the insert and delete operations we can easily implement a decrease-key 
operation, or more generally a change-key operation, that decreases, or arbitrar- 
ily changes, the key of a queue element. (We simply delete the element from 
the queue and re-insert it with its new key.) As the decrease-key operation is 
the bottleneck operation in efficient implementations of Dijkstra’s single-source 
shortest paths algorithm [3], and Prim’s algorithm [15] for finding a minimum 

* Research at Princeton University partially supported by the Aladdin project, NSF 
Grant CCR-9626862. 
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spanning tree, many priority queues support this operation directly, sometimes 
in constant time. The efficient implementation of several algorithms, such as the 
algorithm of Edmonds [4] for computing optimum branching and minimum di- 
rected spanning trees, require the maintenance of a collection of priority queues. 
In addition to the standard operations performed on individual priority queues, 
we also need, quite often, to meld, or unite, two priority queues from this collec- 
tion. This provides a strong motivation for studying meldable priority queues. 

Fibonacci heaps, developed by Fredman and Tarjan [5], are very elegant and 
efficient meldable priority queues. They support delete operations in O(logn) 
amortized time, and all other operations, including meld operations, in 0(1) 
amortized time, where n is the size of the priority queue from which an element 
is deleted. (For a general discussion or amortized time bounds, see [17].) Brodal 
[2] obtained a much more complicated data structure that supports delete oper- 
ations in O(logn) worst-case time, and all other operations in 0(1) worst-case 
time. Both these data structures are comparison-based and can handle elements 
with arbitrary real keys. In this setting they are asymptotically optimal. 

While 0(log n) is the best delete time possible in the comparison model, much 
better time bounds can be obtained in the word RAM model of computation, 
as was first demonstrated by Fredman and Willard [6,7]). In this model each 
key is assumed to be an integer that fits into a single word of memory. Each 
word of memory is assumed to contain w > logn bits. The model allows random 
access to memory, as in the standard RAM model of computation. The set of 
basic word operations that can be performed in constant time are the standard 
word operations available in typical programming languages (e.g., C): addition, 
multiplication, bit-wise and/or operations, shifts, and their like. 

Thorup [18,19] obtained a general equivalence between priority queues and 
sorting. More specifically, he showed that if n elements can be sorted in 0{nf{n)) 
time, where f{n) is a non-decreasing function, then the basic priority queue 
operations can be implemented in 0{f{n)) time. Using a recent 0(n log logn) 
sorting algorithm of Han [9] , this gives priority queues that support all operations 
in O(loglogn) time. Thorup [20] extends this result by presenting a priority 
queue data structure that supports insert, find-min and decrease-key operations 
in 0(1) time and delete operations in 0(log log n) time. (This result is not implied 
directly by the equivalence to sorting.) Han and Thorup [10] obtained recently a 
randomized 0(n-\/log logn) time sorting algorithm. This translates into priority 
queues with 0(-\/loglogn) expected time per operation. 

Adding a Meld Operation 

The priority queues mentioned in the previous paragraph do not support meld 
operations. Our main result is a general transformation that takes these priority 
queues, or any other priority queue data structure, and produces new priority 
queue data structures that do support the meld operation with essentially no 
increase in the amortized cost of the operations! We show that any priority 
queue data structure that supports insert, delete, and find-min operations in 
pq{n) time, where n is the number of elements in the priority queue, can be 
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converted into a priority queue data structure that also supports meld operations 
at essentially no extra cost, at least in the amortized sense. More specifically, 
the new data structure supports insert, meld and find-min operations in 0(1) 
amortized time, and delete operations in 0{pq{n) a{n, n/pq{n))) amortized time, 
where a{m,n) is a functional inverse of the Ackermann function (see [16]). For 
all conceivable values of pq{n), the factor a{n,n/pq{n)) is constant. This holds, 
for example, if pq{n) = l7(log* n). In such cases, adding the meld operation does 
not increase the amortized asymptotic cost of the priority queue operations. If 
the original priority queue is deterministic, so is the new one. 

The result is obtained by an improved analysis of a construction suggested re- 
cently by three of the authors (see [14]). This construction places a non-meldable 
priority queue at each node of a union- find data structure. The simple analysis 
given in [14] gave an upper bound of 0{pq{n) a{n,n))) on the cost of all pri- 
ority queue operations. Here we reduce the amortized cost of insert, meld and 
find-min operations to 0(1), and more importantly, reduce the amortized cost 
of delete operations to 0{pq{n) a{n,n/pq{n))). In other words, we replace the 
factor a{n,n) by a{n,n/pq{n)). This is significant as a{n,n/pq{n)) is constant 
for all conceivable values of pq{n), e.g., if pq{n) = l7(log* n). 

Applying this result to non-meldable priority queue data structures ob- 
tained recently by Thorup [19], and by Han and Thorup [10], we obtain meld- 
able RAM priority queues with O(loglogn) amortized cost per operation, or 
0{y/log logn) expected amortized cost per operation, respectively. Furthermore, 
Thorup’s equivalence between priority queues and sorting and the transforma- 
tion presented here imply that any sorting algorithm that can sort n elements 
in 0(n/(n)) time, where /(n) is a non-decreasing function, can be used to con- 
struct meldable priority queues with 0(1) amortized cost for insert, find-min and 
meld operations, and 0{f{n) a{n,n/ f{n))) amortized cost for delete operations. 

As a by-product of the improved meldable priority queues mentioned above, 
we obtain improved algorithms for the minimum directed spanning tree problem 
in graphs with integer edge weights: A deterministic 0(m log logn) time algo- 
rithm and a randomized 0{m^\og logn) time algorithm. These bounds improve, 
for sparse enough graphs, on the 0{m -\- nlogn) running time of an algorithm 
by Gabow et al. [8] that works for arbitrary edge weights. For more details (and 
references) on directed spanning tree algorithms, see [14]. 

Although the most interesting results are obtained by applying our transfor- 
mation to RAM priority queues, the transformation itself only uses the capabil- 
ities of a pointer machine. 



Improvement for Smaller Integer Keys 

We also show, using an independent transformation, that when all keys are 
integers in the range [1,A^], all occurrences of n in the bounds above can be 
replaced by N, or more generally, by min{n, A^}. This, in conjunction with the 
previous transformation, allows us, for example, to add a meld operation, with 
constant amortized cost, to the priority queue of van Emde Boas [22,23] which 
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make-set(x) : 



link{x, y) : 



find{x) : 



p[x] <— X 
rank[x] <— 0 

union{x, y) : 
link{find{x) , find{y ) ) 



if rank[x] > rank[y] 
then p[y] x 
else p[x] <— j/ 

if rank[x] = rank[y] 

then rank[y] <— rank[y\ + 1 



if p[x] ^ X 

then p[x] find{p[x\) 
return p[x] 



Fig. 1. The classical union-find data structnre 



has pq{n) = 0{loglog N) . The amortized cost of a delete operation is then: 

0(loglog • a(min{n, A^},min{n, fV}/loglogiV) 

= 0{loglogN ■ a{N,N/loglogN)) = 0{loglogN) . 

(The original data structure of van Emde Boas requires randomized hashing to 
run in linear space [13]. A deterministic version is presented in [19].) 

2 The Union-Find Data Structure 

A union-find data structure supports the following operations: 
make-set{x) - Create a set that contains the single element x. 
union{x,y) - Unite the sets containing the elements x and y. 
find{x) - Return a representative of the set containing the element x. 
A classical, simple, and extremely efficient implementation of a union-find 
data structure is given in Figure 1. Each element x has a parent pointer p[x] and a 
rank rank[x\ associated with it. The parent pointers define trees that correspond 
to the sets maintained by the data structure. The representative element of each 
set is taken to be the root of the tree containing the elements of the set. To find 
the representative element of a set, we simply follow the parent pointers until we 
get to a root. To speed-up future find operations, we employ the path compression 
heuristic that makes all the vertices encountered on the way to the root direct 
children of the root. Unions are implemented using the union by rank heuristic. 
The rank rank[x] associated with each element x is an upper bound on the depth 
of its subtree. In a seminal paper, Tarjan [16] showed that the time taken by the 
algorithm of Figure 1 to process an intermixed sequence of m make-set, union 
and find operations, out of which n are make-set operations, is 0{ma{m,n)), 
where a(m, n) is the extremely slowly growing inverse of Ackermann’s function. 
The analysis of the next section relies on the following lemma: 

Lemma 1. Suppose that an intermixed sequence of n make-set operations, at 
most n link operations, and at most f find operations are performed on the 
standard union- find data structure. Then, the number of times the parent pointers 
of elements of rank k or more are changed is at most 0{{f -\- #)•«(/+ 

Proof. (Sketch) A node x is said to be high if rank[x\ > k. There are at most n/2^ 
high elements. The changes made to the pointers of the high elements may be 
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seen as resulting from a sequence of at most n/2^ make-set operations, n/2^ link 
operations and / find operations performed on these elements. By the standard 
analysis of the union-find data structure, the total cost of at most / -I- n/2^“^ 
union-find operations on n/2^ elements is at most 0{{f = 
0((/+ ^) •«(/+ ^,^)), as required. □ 

3 The Transformation 

In this section we describe a transformation that combines a non-meldable pri- 
ority queue data structure with the classical union-find data structure to pro- 
duce a meldable priority queue data structure with essentially no increase in the 
amortized operation cost. This transformation is essentially the transformation 
described in [14] with some minor modifications. An improved analysis of this 
transformation appears in the next section. 

This transformation T receives a non-meldable priority queue data struc- 
ture V and produces a meldable priority queue data structure T{V). We assume 
that the non-meldable data structure V supports the following operations: 

make-pq{x) - Create a priority queue that contains the single element x. 

insert{PQ, x) - Insert the element x into the priority queue PQ. 

delete{PQ, x) - Delete the element x from the priority queue PQ. 

find-min{PQ) — Find an element with the smallest key contained in PQ. 

It is assumed, of course, that each element x has a key key[x] associated with 
it. We can easily add the following operation to the repertoire of the operations 
supported by this priority queue: 

change-key{PQ, x, k) - Change the key of element x in PQ to k. 

This is done by deleting the element x from the priority queue PQ, changing 
its key by setting key[a;] •<— k, and then reinserting it into the priority queue. 
(Some priority queues directly support operations like decrease-key. We shall 
not assume such capabilities in this section.) 

We combine this non-meldable priority queue with the union-find data struc- 
ture to obtain a meldable priority queue that supports the following operations: 

MAKE-PQ(x) - Create a priority queue containing the single element x. 

INSERT{x, y) - Insert element y into the priority queue whose root is x. 

DELETE{x) - Delete element x from the priority queue containing it. 

FIND-MIN{x) - Find element with smallest key in queue with root x. 

MELD{x,y) - Meld the queues whose root elements are x and y. 

CHNG-KEY{x, k) - Change the key associated with element x to k. 

As in the union-find data structure, each priority queue will have a rep- 
resentative, or root, element. The operations INSERT{x,y) and EIND-MIN{x) 
assume that x is the root element of its priority queue. Similarly, MELD{x, y) as- 
sumes that X and y are root elements. It is possible to extend the data structure 
with an additional union-find data structure that supports a find{x) operation 
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MAKE-PQ{x) : 

p[x] <— X 

rank[x] 0 

PQ[x] -h- make-pq{x) 

INSERT{x,y) : 

MAKE-PQ{y) 
MELD{x, y) 

DELETE{x) : 
CHNG-KEY{x, +oo) 

FIND-MIN{x) : 
find-min{PQ[x ] ) 



CHNG-KEY{x, k) : 

change-key(PQ\x],x, k) 
FIND{x) 

MELD{x,y) : 

if rank[x] > rank[y] 
then 

HANG{y, x) 
else 

HANG{x, y) 

if rank[x] = rank[y\ 

then 

rank[y\ rank[y\ + 1 

FIND{x) : 

GUT-PATH(x) 
GOMPRESS-PA TH{x) 
return p[x] 



GUT-PATHjx) : 

if p[x] X then 
GUT-PATH(p[x]) 
UNHANG{x,p[x]) 

GOMPRESS-PATH{x) : 

if p[x] ^ X then 

GOMPRESS-PA TH{p[x ] ) 
HANG{x,p[p[x]]) 

HANGjx, y) : 

insert{PQ\y \ , find-inin{PQ[x\)) 
p[x\ <r- y 

UNHANG{x,y) : 
delete{PQ[y ] , find-min{PQ[x ] ) ) 



Fig. 2. A meldable priority queue obtained by planting a non-meldable priority queue 
at each node of the union-hnd data structure. 



that returns the root element of the priority queue containing x. (As explained 
in [11], a meldable priority queue data structure that supports a MELD{x, y) op- 
eration that melds the priority queues containing the elements x and y, where x 
and y are not necessarily representative elements must include, at least implic- 
itly, an implementation of a union-find data structure.) 

A collection of meldable priority queues is now maintained as follows. Each 
priority queue of the collection is maintained as a tree of a union-find data 
structure. Each element x contained in such a tree thus has a parent pointer p[x\ 
assigned to it by the union-find data structure and a rank rank[x\. In addition 
to that, each element x has a ‘local’ priority queue PQ[x] associated with it. 
This priority queue contains the element x itself, and the minimal element of 
each subtree of x. (Thus if x has d children, PQ[x] contains d -I- 1 elements.) 
If X is at the root of a union-find tree, then to find the minimal element in 
the priority queue of x, a FIND-MIN{x) operation, we simply need to find the 
minimal element is the priority queue PQ[a;], a, find-min{PQ[x\) operation. 

When an element x is first inserted into a priority queue, by a MAKE-PQ{x) 
operation, we initialize the priority queue PQ[x] of x to contain x, and no other 
element. We also set p[x] to x, to signify that a; is a root, and set rank[x] to 0. 

If x and y are root elements of the union-find trees containing them, then a 
MELD{x,y) operation is performed as follows. As in the union-find data struc- 
ture, we compare the ranks of x and y and hang the element with the smaller 
rank on the element with the larger rank. If the ranks are equal we decide, arbi- 
trarily, to hang X on y and we increment rank[y]. Finally, if x is hung on y, then 
to maintain the invariant condition stated above, we insert the minimal element 
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in PQ[x] into PQ[y], an insert{PQ[y], find-min{PQ[x])) operation. (If y is hung 
on X we perform an insert{PQ[x],find-min{PQ[y])) operation.) 

A DELETE{x) operation, which deletes x from the priority queue containing 
it is implemented in the following indirect way. We change the key associated 
with X to + 00 , using a CHNG-KEY{x, +oo) operation, to signify that x was 
deleted, and we make the necessary changes to the data structure, as described 
below. Each priority queue in our collection keeps track of the total number of 
elements contained in it, and the number of deleted elements contained in it. 
When the fraction of deleted elements exceeds a half, we simply rebuild this 
priority queue. This affects the amortized cost of all the operations by only a 
constant factor. (For more details see Kaplan et al. [12].) 

How do we implement a CHNG-KEY{x, k) operation then? If a; is a root 
element, we simply change the key of x in PQ[x] using a change-key(PQ[x], x, k) 
operation. If x is not a root, then before changing the key of x we perform a 
FIND{x) operation. A FIND{x) operation compresses the path connecting x to 
the root by cutting all the edges along the path and hanging all the elements 
encountered directly on the root. Let x = Xi,X 2 , ■ ■ ■ ,Xk be the sequence of 
elements on the path from x to the root of its tree. For i = k— l,k — 2,...,lwe 
unhang Xi from Xi+i. This is done by removing find-min{PQ[xi]) from PQ[xi+i\. 
After that, we hang all the elements X\,X 2 , ■ ■ ■ ,Xk-i on Xk- This is done by 
setting p[xi] to Xk and by adding find-min{PQ[xi]) to PQ[xk]- (Note that we 
also unhang Xk-i from Xk and then hang it back.) 

If X is not a root element then after a FIND{x) operation, cc is a child of the 
root. Changing the key of x is now relatively simple. We again unhang x from 
p[x\, change the key of x and then hang x again on p[x\. A moment’s reflection 
shows that it is, in fact, enough just to change the key of x in PQ[x], and then 
perform a FIND{x) operation. The element x may temporarily be contained in 
some priority queues with a wrong key, but this will immediately be corrected. 

A simple implementation of all these operations is given in Figure 2. The 
important thing to note is that the operation of a meldable priority queue mimics 
the operation of a union-find data structure and that changing a pointer p[x] 
from y to y' is accompanied by calls to UNHANG{x,y) and HANG{x,y'). 

Since the union-find data structure makes only an amortized number of 
0(a(n,n)) hanging and unhangings per union or find operation, we immedi- 
ately get that each meldable priority queue operation takes only 0{pq{n) a{n, n)) 
amortized time. This was the result obtained in [14]. Here, we tighten the anal- 
ysis so as to get no asymptotic overhead with current priority queues. 



4 The Improved Analysis 

In this section we present an improved analysis of the data structure presented in 
the previous section. We assume that the non-meldable priority queue V supports 
insert, delete and /ind- mm operations in 0{pq{n)) (randomized) amortized time. 
By applying a simple transformation described in [1] we can actually assume that 
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the amortized cost of insert and find-min operations is 0(1) and that only the 
amortized cost of delete operations is 0{pq{n)). We now claim: 

Theorem 1. If V is a priority queue data structure that supports insert and 
find-min operations in 0(1) (expected) amortized time and delete operations in 
0{pq{n)) (expected) amortized time, then T{V) is a priority queue data structure 
that supports insert, find-min and meld operations in 0(1) (expected) amortized 
time and delete operations in 0{pq{n) a{n,n/pq{n))) (expected) amortized time, 
where a{m, n) is the inverse Ackermann function appearing in the analysis of 
the union-find data structure, and n here is the maximum number of elements 
contained in the priority queue. 

Proof. Consider a sequence of n operations on the data structure, of which f < n 
are DELETE or CHNG-KEY operations. (Each such operation results in a FIND 
operation being performed, hence the choice of the letter /.) Our aim is to show 
that the cost of carrying out all these operations is 0(n-\- f pq{n) a{n, n/pq{n))). 
This bounds the amortized cost of each operation in terms of the maximum num- 
ber of elements contained in the priority queues. In the full version of the paper 
we will give a slightly more complicated analysis that bounds the amortized com- 
plexity of each operation in terms of the actual number of elements contained in 
the priority queue at the time of the operation. 

All the operations on the data structure are associated with changes made 
to the parent pointers p[x\ of the elements contained in the priority queues. To 
change the value of p[x] from y to y' , we first call UNHANG{x, y) which performs 
a delete operation on PQ[y], and then call HANG{x, y') which performs an insert 
operation on PQfy'] and sets p[x\ to y' . As insert operations are assumed to take 
constant time, we can concentrate our attention on the delete, or UNHANG, 
operations. As the total number of pointer changes made in the union- find data 
structure is at most 0(na(n,n)), and as each priority queue acted upon is of 
size at most n, we get immediately an upper bound of 0{npq{n) a{n, n)) on the 
total number of operations performed. This is essentially the analysis presented 
in [14]. We want to do better than that. 

If element a: is a root of one of the union-find trees, we let size(x) be the 
number of elements contained in its tree. If x is no longer a root, we let size(x) be 
the number of descendants it had just before it was hanged on another element. 
It is easy to see that we always have size(x) > 

Let p = pq(n) -\-nff, S = p^ and L = log S. We say that an element x is big 
if size{x) > S. Otherwise, it is said to be small. We say that an element x is high 
if rank{x) > L. Otherwise, it is said to be low. Note that if an element is big (or 
high), so are all its ancestors. We also note that all high elements are big, but 
big elements are not necessarily high. We let SMALL, BIG, LOW and HIGH 
be the sets of small/big/low and high vertices, respectively. As noted above, we 
have SMALL C LOW and HIGH C BIG but LOWD BIG may be non-empty. 

Below we bound the total cost of all the UNHANG{x,p[x]) operations. All 
other operations take only 0(n) time. We separate the analysis into four cases: 

Case 1: x,p[x] G SMALL 
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We are doing at most / path compressions. Each path in the union- find 
forest contains at most L small elements. (This follows from the invariant 
rank[p[x\\ > rank[x\ and from the fact that high elements are big.) Thus, each 
path compression involves at most L unhang operations in which x,p[x\ G 
SMALL. As each priority queue involved is of size at most S, the total cost 
is 0{f ■ L-pq{S)) = 0{f -p) = 0{n+ f ■pq{n)). (Note that L = log S = O(logp) 
and that pq{S) = O(logS') = O(logp). (We assume that pq{n) = O(logn).) 
Hence L ■ pq{S) = O(log^p) = 0{p).) 

Case 2: X G SMALL and p[x\ G BIG. 

In each one of the / path compressions performed there is at most one unhang 
operation of this form. (As ancestors of big elements are also big.) Hence, the 
total cost here is 0{fpq{n)). 

Case 3: x,p[x\ G BLGC\ LOW. 

To bound the total cost of these operations we bound the number of elements 
that are contained at some stage in BLG D LOW. An element is said to be a 
minimally-big element if it is big but all its descendants are small. As each ele- 
ment can have at most one minimally-big ancestor, and each minimally-big ele- 
ment has at least S descendants, it follows that there are at most n/S minimally- 
big elements. As each big element is an ancestor of a minimally-big element, it 
follows that there are at most Ln/S elements in BLGO LOW. 

An element x G BLG D LOW can be unhanged from at most L other ele- 
ments of BLG A LOW. (After each such operation ranfc[p[x]] increases, so after 
at most L such operations p[x] must be high.) The total number of operations 
of this form is at most L'^n/S < n/p. Thus, the total cost of all these operations 
is 0{npq{n)/p) = 0{n). 

Case 4: x,p[x] G HLGH. 

To bound the number of UNHANG{x , p[x]) operations in which x,p[x] G HLGH, 
we rely on Lemma 1. As each UNHANG{x , p[x]) operation, where x G HLGH is 
associated with a parent pointer change of a high vertex, it follows that the total 
number of such operations is at most 0{{f + §) • «(/+ f , §)) = 0{f ■ a{f, g)). 
(This follows as n/S < /.) Now 



«(/,§) < a(f,^ 



) < a{n,^) < a{n, 



pq(n) 



)■ 



This chain of inequalities follows from the fact that f > n/p and from simple 
properties of the a(m, n) function. (The a{m, n) function is decreasing in its 
first argument, increasing in the second, and a(m,n) < a{cm,cn), for c > 1.) 

As the cost of each delete operation is 0{pq{n)), the cost of all unhang opera- 
tions with x,p[x] G HLGH is at most 0{f -pq{n) ■ a{n, n/pq{n)), as required. □ 



5 Bounds in Terms of the Maximal Key Value 

In this section we describe a simple transformation, independent of the transfor- 
mation of Section 3, that speeds up the operation of a meldable priority queue 
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data structure when the keys of the elements are integers taken from the range 
[1, N], where N is small relative to n, the number of elements. More specifically, 
we show that if P is a meldable priority queue data structure that supports 
delete operations in 0{pq{n)) amortized time, and all other operations in 0(1) 
amortized time, where n is the number of elements in the priority queue, then it 
is possible to transform it into a meldable priority queue data structure T'{V) 
that supports delete operations in 0{pq{min{n, N})) amortized time, and all 
other operations in 0(1) time. To implement this transformation we need ran- 
dom access capabilities, so it cannot be implemented on a pointer machine. 

To simplify the presentation of the transformation, we assume here that a 
delete operation receives a reference to the element x to be deleted and to the 
priority queue containing it. This is a fairly standard assumption.^ Note, how- 
ever, that the delete operation obtained by our first transformation is stronger as 
it only requires a reference to the element, and not to the priority queue. In the 
full version of the paper we show that this assumption is not really necessary, so 
the delete operations obtained using the transformation 'T' again require only a 
reference to the element to be deleted. 

The new data structure 'T'{V) uses two different representations of priority 
queues. The first representation, called the original, or non-compressed repre- 
sentation is simply the representation used by V. The second representation, 
called the compressed representation, is composed of an array of size N contain- 
ing for each integer k G [1, JV] a pointer to a doubly linked list of the elements 
with key k contained in the priority queue. (Some of the lists may, of course, 
be empty.) In addition to that, the compressed representation uses an original 
representation of a priority queue that holds the up to N distinct keys belonging 
to the elements of the priority queue. 

Initially, all priority queues are held using the original representation. When, 
as a result of an insert or a meld operation, a priority queue contains more 
than N elements, we convert it to compressed representation. This can be easily 
carried out in 0{N) time. When, as a result of a delete operation, the size of a 
priority queue drops below N /2, we revert back to the original representation. 
This again takes 0{N) time. The original representation is therefore used to 
maintain small priority queues, i.e., priority queues containing up to N elements. 
The compressed representation is used to represent large priority queues, i.e., 
priority queues containing at least N/2 elements. (Priority queues containing 
between N/2 and N elements are both small and large.) 

By definition, we can insert elements to non-compressed priority queues in 
0(1) amortized time, and delete elements from then in 0{pq{n)) = 0{pq{N)) 
amortized time. We can also insert an element into a compressed priority queue 
in 0(1) amortized time. We simply add the element into the appropriate linked 
list, and if the added element is the first element of the list, we also add the 
key of the element to the priority queue. Similarly, we can delete an element 

^ A reference to the appropriate priority queue can be obtained using a separate union- 
find data structure. The amortized cost of finding a reference is then 0{a(n, n)). This 
is not good enough for us here as we are after bounds that are independent of n. 
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from a compressed priority queue in 0(pq{N)) amortized time. We delete the 
element from the corresponding linked list. If that list is now empty, we delete the 
key from the non-compressed priority queue. As the compressed priority queue 
contained at most N keys, that can be done in 0{pq{N)) amortized time. Since 
insert and delete operations are supplied with a reference to the priority queue to 
which an element should be inserted, or from which it should be deleted, we can 
keep a count of the number of elements contained in the priority queue. This can 
be done for both representations. (Here is where we use the assumption made 
earlier. As mentioned, we will explain later why this assumption is not really 
necessary.) These counts tell us when the representation of a priority queue 
should be changed. 

A small priority queue and a large priority queue can be melded simply be 
inserting each element of the small priority queue into the large one. Even though 
this takes 0(n) time, where n is the number of elements in the small priority 
queue, we show below that the amortized cost of this operation is only 0(1). 

Two large priority queues can be easily melded in 0{N) time. We simply 
concatenate the corresponding linked lists and add the keys that are found, say, 
in the second priority queue, but not in the first, into the priority queue that 
holds the keys of the first priority queue. The second priority queue is then 
destroyed. We also update the size of the obtained queue. Again, we show below 
that the amortized cost of this is only 0(1). 

Theorem 2. If V is a priority queue data structure that supports insert, 
find-min and meld operations in 0(1) (expected) amortized time and delete op- 
erations in 0{pq{n)) (expected) amortized time, then T'{V) is a priority queue 
data structure that supports insert, find-min and meld operations in 0(1) (ex- 
pected) amortized time and delete operations in 0(pg(min{n, iV})) (expected) 
amortized time. 

Proof. We use a simple potential based argument. The potential of a priority 
queue held in original, non-compressed, representation is defined to be 1.5n, 
where n the number of elements contained in it. The potential of a compressed 
priority queue is N, no matter how many elements it contain. The potential of 
the whole data structure is the sum of the potentials of all the priority queues. 

The operations insert, delete and find-min have a constant actual cost and 
they change the potential of the data structure by at most an additive constant. 
Thus, their amortized cost is constant. 

Compressing a priority queue containing N < n < 2N elements requires 
0{N) operations but it reduces the potential of the priority queue from 1.5n 
to N, a drop of at least N/2, so with proper scaling the amortized cost of 
this operation may be taken to be 0. Similarly, when a compressed priority 
queue containing n < N/2 elements is converted to original representation, the 
potential of the priority queue drops from N to 1.5n, a drop of at least A^/4, so 
the amortized cost of this operation is again 0. 

Melding two original priority queues has a constant actual cost. As the poten- 
tial of the data structure does not change, the amortized cost is also constant. 
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Melding two compressed priority queues has an actual cost of 0{N), but the 
potential of the data structure is decreased by N, so the amortized cost of such 
meld operations is 0. Finally, merging a small priority queue of size n < N, in 
original representation, and a compressed priority queue has an actual cost of 
0{n) but the potential decreases by 1.5n, giving again an amortized cost of 0. 
This completes the proof. □ 

6 Further Work 

By combining the transformation of Section 3 with the atomic heaps of Fredman 
and Willard [7], we can obtain a transformation that converts a non-meldable 
priority queue date structure V with operation time 0{pq{n)) into a meldable 
priority queue date structure T^{V) that supports insert, meld and find-min 
operations in 0(1) amortized time, and delete operations in 0{pq{n) + a{n,n)) 
amortized time. This is done by using an atomic heap, instead of a P priority 
queue, in nodes of the union-find data structure whose size is at most logn. The 
details will be given in the full version of the paper. This transformation uses, 
however, a stronger model in which atomic heaps can be realized (see [21]). 
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Abstract. The cyclic edge connectivity is the size of a smallest edge 
cut in a graph snch that at least two of the connected components con- 
tain cycles. We present an algorithm running in time 0{n^ log^ n) for 
computing the cyclic edge connectivity of n-vertex cubic graphs. 



1 Introduction 

The cyclic connectivity as a graph parameter was introduced by Tait [14] already 
in 1880. A cyclic edge cut of a graph G is an edge cut such that at least two 
components of the graph without the cut contain a cycle. If G is not connected 
and at least two of its components contain a cycle, then an empty set of edges 
form a cyclic edge cut. The cyclic edge eonnectivity of a graph G is the size of 
a smallest cyclic edge cut. If G is connected, then each smallest cyclic edge cut 
splits G in exactly two components. A graph may have no cyclic edge cuts at 
all: K 4 , K 5 , and IT„ are examples of such graphs. 

The cyclic (vertex and edge) connectivity is an important graph parameter 
both from the theoretical and the practical points of view. As for the usual 
vertex and edge connectivity, the cyclic connectivity reflects the level of con- 
nectivity of a graph. However, both the usual vertex and edge connectivity are 
bounded for classes of graphs of bounded degree (say cubic graphs) and thus 
they do not say much about how the particular graph is connected. Hence, 
the cyclic connectivity can replace the usual connectivity in applications where 
the considered graphs have bounded maximum degree. Such applications range 
through all areas where graph algorithms have been applied and they include 
e.g. robustness of local computer networks and parallel computer architectures 
(networks are usually represented by graphs with bounded degrees since a single 
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computer/processor usually communicates only to a few other units) or the sta- 
bility of tertiary structure of proteins. The cyclic connectivity actually fits the 
purpose of the latter application extremely well because a protein never unfolds 
(splits) to acyclic parts because of its primary and secondary structure [3] (with 
a possible exception for the exchange of small functional groups in biochemi- 
cal reactions). From the theoretical point of view, not only the relation of the 
cyclic connectivity to other connectivity parameters [13,12] is interesting, but 
the cyclic edge connectivity also plays an essential role regarding the structure 
of cubic graphs [1,2,5,8,9,10]. The problem to develop a polynomial-time algo- 
rithm for the cyclic edge connectivity of cubic graphs was actually introduced 
to us by Roman Nedela in connection with designing an efficient algorithm for 
generating so-called snarks [11] which are cubic bridgeless 4-edge uncolorable 
graphs. 

There is little of previous work on algorithms for the cyclic edge connectivity 
of either cubic graphs or graphs in general. The only published result is an 
0(n^ log n)-algorithm by Lou et. al. [6] but the algorithm is incorrect [7]: the 
procedure cyclic_connectivity described on pages 251-253 in [6] does not 
always return the set of all minimal cyclic edge cuts as claimed in the paper. 
In addition, if case 3 applies, the number of cut sets may double and this is 
not considered in the time analysis. So, the algorithm does not always work in 
polynomial time and it does not even produce a correct result for some specific 
graphs. It is not clear whether their algorithm can be fixed [7]. 

In the present paper, we show two algorithms for computing the cyclic edge 
connectivity of cubic graphs, one running in time O(n^logn) and the other in 
time O(n^log^n). Both our algorithms are easy to implement and their time 
bounds do not involve any hidden large constants. So, both can be said to be 
practical and can be used for each of the purposes mentioned in the previous 
paragraph. The main idea of our algorithms is completely different from the 
idea of the algorithm of Lou et. al. whose algorithm recursively generates all 
cyclic edge cuts in a cubic graph. At the end of the paper, we briefly sketch 
how the algorithms can be extended to regular graphs and to a polynomial-time 
algorithm for the cyclic edge connectivity of arbitrary graphs. 

First, we present our 0{n^ log n)-algorithm in Section 4. When presenting this 
algorithm, we explain ideas which are later used to design an algorithm with a 
better running time. The algorithm first computes the girth g of an input graph 
G. Recall that the girth of a graph G is the length of its shortest cycle. The girth 
is an upper bound on the cyclic edge connectivity of a cubic graph if |y(G)| >8 
(Lemma 1). Then, the algorithm computes minimum edge separations for all 
pairs of vertex disjoint full trees (see Section 2 for the definition) of depth at 
most 0(log(/) (the number of such pairs is 0(n^ log^ logn)). The correctness of 
such an algorithm follows straightforwardly from Theorem 1. 

Next, we present an 0{n^ log^ n)-algorithm in Section 5. Again, the algorithm 
first computes the girth g of an input graph G. If the number of vertices of 
the input graph is small (at most 242), we run the 0(n^ log n)-algorithm for 
computing the edge connectivity. Otherwise, we find g edge-disjoint subgraphs 
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of G which are large in a sense introduced later in the paper. Since g is an 
upper bound on the cyclic edge connectivity, we want to test the existence of a 
cyclic edge cut of size at most g — I- If such a cyclic edge cut exists, then one 
of the g subgraphs is disjoint from it. Hence it is enough to compute minimum 
edge separations between the g subgraphs and vertex disjoint full trees of depth 
at most O(logg). The number of such pairs is only O(nlognloglogn) which 
reduces the time complexity compared to the algorithm from Section 4. 

Both our algorithms assume that the input graph is cubic, connected, and 
contains at least 8 vertices because such cubic graphs always contain a cyclic 
edge cut (Lemma 1) which is not the case of cubic graphs on 2, 4 and 6 vertices, 
e.g., the triple edge, K4 or K3 3. This does not harm usability of our algorithms: 
There is only a single cubic graph on two vertices (a triple edge) and there 
are only three cubic graphs on four vertices (two triple edges, two double edges 
joined by a matching and a complete graph on four vertices). The only cubic 
graph without a cyclic edge cut on six vertices is These five exceptional 
cases can be easily checked in the very beginning of our algorithms. 

2 Notation 

In this section, we introduce notation and definitions used throughout the paper. 
Graphs considered in this paper may contain parallel edges but no loops. We 
write V{G) and E{G) for the vertex set and the edge set of a graph G, respec- 
tively. G\W] denotes the subgraph of G induced by the vertex set W, W C V (G). 
A graph G is said to be cubic if all its vertices have degree three. 

An algorithmic procedure which is widely used in computer science that 
we use several times throughout the paper is a procedure for the breadth-first 
search (BPS) of a graph. You start at a vertex vq of G and assign sequentially 
the vertices of G labels which are equal to their distances from vq. We mean by 
a BFS-graph of depth d the graph induced by all the edges uv such that u is 
labelled by at most d — 1 and v by at most d. Note that a BFS-graph of depth 
d need not be a subgraph of G induced by the vertices of distance at most d 
from vg (the edges joining two vertices at the distance d are missing). If the 
BFS-graph is acyclic, we call it a BFS-tree. The vertex vg is said to be a root of 
the BFS-graph (BFS-tree) . The vertices labelled with the number k form a level 
k of it. We sometimes abuse this notation a little and we root the BFS-graph at 
an edge. In such case, both the end- vertices of the edge are labelled by zero and 
the other vertices are labelled as described above. 

We often use arguments related to cleverly chosen BFS-graphs to prove upper 
bounds on the girth of a graph. Observe the following simple facts: If the girth 
of G is at least 2k-\-l, then a BFS-graph of depth k rooted at any vertex of G is 
acyclic. If the girth of G is at least 2k 2, then the vertices of a BFS-graph of 
depth k rooted at any vertex of G induce an acyclic subgraph of G. This concept 
may actually be used to prove the following lemma from [6] : 

Lemma 1. Let G be a cubic graph of order n and girth g, n > 8. Then, g < 
2 [log 2 (n/3 -I- 1)] . Moreover, G contains a cyclic edge cut of size g. 
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Edge cuts are understood in this paper as decompositions of a vertex set of 
G into two sets. A cut is denoted by (A, B) where A and B are disjoint subsets 
ofV(G) such that AUB = V(G), i.e., it is viewed as a partitioning of the vertex 
set to two subsets. The cut itself is formed by the edges joining a vertex of A 
and a vertex of B and its size is equal to the number of such edges. An edge cut 
is said to be cyclic if both G[A\ and G[B] contain a cycle. 

We consider a problem of finding cyclic edge cuts in cubic graphs. The defi- 
nition introduced in this paragraph and the next lemma give us a tool to prove 
that an edge cut is a cyclic edge cut. Let G' be a connected subgraph of G. 
Then, the activity of G' is equal to ~ deg^/ v). Note that the 

activity of G' is at least the number of edges incident with the vertices of V (G') 
but not included in G'. Note that the activity might be larger than this number 
because the edges not belonging to G' which join two vertices of G' are counted 
“twice” . A motivation for this definition is obvious from the following lemma: 

Lemma 2. Let Gi and G 2 he vertex-disjoint connected subgraphs of a cubic 
graph G, each of activity at least A: -I- 1. Any edge cut separating Gi and G 2 
which is of size at most k is a cyclic edge cut. 

Proof. Let (A,B) be an edge cut of size at most k such that V{Gi) C A and 
V(G 2 ) C B and let Eab be the edges joining A and B. We prove each of G[A\ 
and G[B] contains a cycle. By symmetry, we focus on the case of G[A\ only. If 
the subgraph of G induced by V{G\) is not acyclic, then the claim is trivial. 
Assume that G\ is an induced subtree of G. 

Assume now for the sake of contradiction that G[A\ is a forest. Let E\ be the 
set of edges incident with a vertex of G\ but which are not in G\ . Since G\ is an 
induced subtree of G, the activity of G\ is equal to |i?i| and hence \Ei\ > k-\-l. 
Then, the number of leaves of the forest G[A] which are not vertices of Gi must 
be at least \Ei \ Eab\ = \E\\ — \E\ n Eab\ > fc -I- 1 — \E\ n Eab\- Each such 
a leaf of G[A\ is incident with at least two edges of Eab because G is cubic. 
Hence, the size of Eab has to be at least 2{k -I- 1 — \Ei n Eab\) + \Ei fl Eab \ = 
2k -\- 2— |Ai n Eab\ > k-\- 2 which is impossible. 

The graphs for which we often use Lemma 2 are trees. A complete binary 
tree of depth d is a tree rooted at a vertex v with levels 0, 1, . . . , d such that each 
vertex of levels 0, 1, . . . , d — 1 has two children. For the sake of brevity, we call 
a complete binary tree of depth d just a binary tree of depth d. The number of 
vertices of the last level of a binary tree of depth d is 2‘^ and the activity of such 
a binary tree contained in a cubic graph is 1 -I- 2^^+^. A full tree of depth d is a 
tree rooted at a vertex v with levels 0, 1, . . . , d such that the vertex v have three 
children and each vertex at a level between 1 and d — 1 has two children. The 
number of vertices of the last level is 3 • 2‘^~^ and the activity of a full tree of 
depth d which is a subgraph of a cubic graph is 3 • 2^^. 
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3 Structural Results 

Both our algorithms are based on a structural result from Theorem 1 presented 
in this section. The result seems to be of its own interest: Consider a cubic graph 
G with the cyclic edge connectivity n. Then, G either contains a cycle of length 
K or there exists a cyclic edge cut of size k such that each of the components 
contains a full tree of depth l7(logK). There is a short proof of the existence of 
a full tree of depth 17 (logs:) in this setting, but we need that its activity is at 
least K + 1. So an arbitrary function of order l7(log«:) does not suffice for our 
purposes. Even this stronger statement can be proved easily if k is sufficiently 
large, but if we want to design practical algorithms, we need that the activity of 
such a tree is at least k + 1 for all values of k. It turned out that actually small 
values of k (between 6 and 25) require quite a lot of work and the full proof 
of Theorem 1 is even a little technical in some aspects. Because of this and the 
space limitations, we only sketch the main ideas of its proof. 

The following lemma was proved in [6] , but we include its short proof for the 
sake of completeness: 

Lemma 3. Let G he a connected cubic graph with a cyclic edge connectivity k 
and let {A, B) be a cyclic edge cut of size k. Then, both G[A\ and G[B] are 
connected graphs with minimum degree two. The number of degree-two vertices 
in each of G[A] and G[B] is exactly k. 

Proof. G[A\ and G[B] must clearly be connected graphs by the minimality of 
the edge cut. Assume that G contains a vertex v incident with two edges of a 
cyclic edge cut {A, B) and v & A. Consider a cut (A', B') such that A' = A\ {u} 
and B' = B U {w}. Both G[A'] and G\B'] contain a cycle {v has degree one in 
G[A] and hence it is contained in no cycle of G[A\ and B C B') but the size of 
the cut (A',B') is smaller than the size of the cut (A,B) because G is cubic. 
Thus, if {A, B) is a cyclic edge cut of size k, no vertex is incident with two edges 
of the cut. The lemma now follows straightforwardly. 



Lemma 4. Let H he a connected graph with ri 2 vertices of degree two and 
vertices of degree three. Lf H does not contain a full tree of depth d, d > 1, and 
ri 3 > {2^ — 2)u 2, then H contains a cycle of length at most 2d. 

Proof. Let us consider a vertex v of degree three in H. If the BFS-graph of 
depth d rooted at v is not acyclic, then H contains a cycle of length at most 2d. 
Otherwise, the BFS-graph of depth d rooted at u is a tree, but it cannot be a full 
tree. Hence, there is a non-leaf vertex of degree two contained in it. Hence the 
distance between v and the nearest degree- two vertex is at most d—1. Since the 
choice of v was arbitrary, we may conclude that each vertex of degree three is at 
distance at most d—1 from a vertex of degree two. The number of neighbors at 
distance at most d—1 from a vertex of degree two is at most 2*^ — 2 for a fixed 
degree two vertex. However, this implies that < (2^^ — 2)n2. 
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Theorem 1. Let G he a cubic graph with a cyclic edge connectivity k, k > 1. 
Then at least one of the following holds: 

1. G contains a cycle of length n. 

2. G contains a cyclic edge cut of size k such that each of the two parts contains 
a full tree of depth d = [log2 . 

Proof. Assume that G does not contain a cycle of length k, i.e., the girth of G is 
at least k+ 1. If none of the cyclic edge cuts {A, B) of size k has the property that 
both G[A\ and G[B\ contain the full tree, then consider one where the number 
of vertices of A is as small as possible. Let H = G[A], Clearly, H ^ Gk, and the 
girth of iL (a subgraph of G) is at least k+1. By Lemma 3, iL is connected, its 
minimum degree is two and the number of degree-two vertices of H is exactly k. 
Let ri2 = K be the number of vertices of degree two (in the rest, we use both rz2 
and K for this same number depending on which of the two quantities we like to 
emphasize) and ns the number of vertices of degree three of H . 

We prove the theorem only for n > 26 (hence d > 4). A (more or less 
technical) proof of this theorem for smaller values of k is left due to space 
limitations (it is necessary to use finer bounds on the number of vertices in BFS- 
trees). We bound the number of vertices of H. If H does not contain a full tree 
of depth d, then ns < {2^ — 2)u2 by Lemma 4. We bound the number of vertices 
of H as follows: 

n2 + ns < {2‘^ - 1 )k < (2'°>^^ - l) n < ^2 • n < ^ 

Let I = [n/2j . Since the girth of H is at least n-|- 1, the BFS-graph of depth I 
rooted at any vertex of degree three is acyclic. It can be shown that H contains 
no two adjacent vertices of degree two and thus the number of vertices at level 
i, 1 < i < I, oi this BFS-tree is at least 3 • 2L~J. Hence the BFS-tree contains 
at least the following number of vertices: 

l-b3-2°-b3-2°-b3-2i-b3-2i-b3-2^-b...-b3 - 2 L^J = 

l-b3-2°-b3-2^-b...-b3-2L^J -b 3 • 2° -b 3 • 2^ -b . . . 3 • 2L^J = 

1 -b 3 ^2L'/2J - -b 3 ^2^'/^^ - = 3 • 2L'/2J + 3 . 2 ^ 1 ^ _ 5 

Since I is integer, we further bound this number of vertices as follows: 

3 • 2L'/2J + 3 . 2^iG] - 5 > 3 • 2 • 2*/^ -5 = 6- 2'/^ - 5 
We get using the inequality I = [n/2j > n/2 — 1/2 the following: 

6 • 2'/^ - 5 > 6 • - 5 

We may conclude that H must contain at least 6 • — 5 vertices. But 

H contains at most 2 k^/ 3 vertices as proved above. This is impossible because 
2/tV3 < 6 • - 5 for k > 26. 
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4 An 0(n^ log n)- Algorithm 

We are now ready to present our first algorithm for the cyclic edge-connectivity 
of cubic graphs: 

Theorem 2. There is an algorithm for computing the cyclic edge connectivity 
of cubic graphs running in time 0(n^ log n). 

Proof. We assume that the number of vertices of the input graph G is at least 8. 
By Lemma 1, such a graph G always contains a cyclic edge cut. The cases when 
G has 2, 4 or 6 vertices can be easily handled separately as explained at the end 
of Section 1. A pseudocode of the algorithm is the following (the subroutines 
findpaths and findcut are described in detail at the end of the proof): 

Input: a cubic graph G of order at least 8 

Output: a cyclic edge cut of the minimum size 

cutsize := girth(G) 

cut := edges incident with a cycle of length cutsize 
forall V G V (G) do 
forall w G V{G) do 
d := -1 
paths : = 0 
repeat 

d := d + 1 

Tv ;= a full tree of depth d rooted at v 

Tw := a full tree of depth d rooted at w 

if Tv and Tw are not vertex-disjoint then break 

paths := findpaths (Tv, Tw, paths) 
if |paths| < 3 * 2^^ auid |paths| < cutsize then 
cutsize = |paths| 
cut = findcut (Tv, Tw, paths) 
fi 

while 3 * 2^^ < cutsize 
endf or 
endf or 

output cutsize and cut 

The algorithm first computes the girth g of the input graph G. This can be 
straightforwardly done in time O(n^) by running a BFS routine from each of 
the vertices of G. The girth g which is of order O(logn) (Lemma 1) is an upper 
bound on the cyclic edge connectivity of G. The algorithm maintains the size k 
of the smallest cyclic edge cut found so far (this value is initially set to the girth 
of G) . In the main cycle, the minimum edge separations between the full trees of 
depth d rooted at v and at w (if such two trees are vertex-disjoint) are computed 
for all pairs v and w of the vertices of G and all the values of 0 < d < |"log 2 k/3~\. 
If the size of the edge cut is smaller than 3 • 2"^, then the edge cut is cyclic by 
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Lemma 2. Hence, we can use this edge cut as a new candidate for a minimum 
cyclic edge cut (if it is smaller than the smallest one found so far) . 

We use a simple flow algorithm to find the edge cut for each pair of full trees 
which either finds 3 • 2^^ edge-disjoint paths between the full trees or the edge cut 
of size smaller than 3-2'^. The algorithm at each iteration either augments the 
flow (increase the number of edge-disjoint paths) between the trees or finds an 
edge cut of the size equal to the number of the paths (see also the comments at 
the end of the proof). The number of iterations is bounded by 3 • 2'^. Each of the 
iterations requires time which is linear in the number of edges of G, i.e., 0(n). For 
a fixed pair of vertices, we have to run at most 3-2°-|-3-2^ -|-3-2^-|- . . . 3 - 2 r*°S 2 
iterations, hence the number of iterations of the flow algorithm for a fixed pair 
of vertices is at most 0{g) for all the pairs of full trees of depths 1, . . . , 0(log g) 
altogether. Since moreover each call of the flow algorithm needs time 0{n), as we 
discuss below, the algorithm runs in time 0{n^g) = 0{n^ log n). We may further 
improve a running time of the algorithm by using the edge-disjoint paths found 
between the full trees of depth d rooted at v and w as a starting set of paths 
between the full trees of depth d -|- 1 rooted at v and w. 

The correctness of the algorithm follows from Theorem 1. Let n be the cyclic 
edge connectivity of G. If k = 0, then G is disconnected and the algorithm 
clearly works correctly. If there is a cycle of length k in G, then the minimum 
cyclic edge cut is equal to the girth of G and we And it in the very beginning 
of the algorithm. Otherwise, there exists an edge cut {A, B) of size k such that 
both G[A] and G[B] contain full trees of depth |"log 2 due to Theorem 1. At 
a certain step of the algorithm, the minimum edge separation between these two 
full trees was computed and its size was at most k (the cyclic edge cut is one of 
edge cuts between A and B). Since the activity of a full tree of depth |"log 2 
is at least k -I- 1, we found a cyclic edge cut of size k, (any cut of size k, between 
these full trees is cyclic by Lemma 2). 

We give some comments to the implementation. The algorithm uses the fol- 
lowing subroutines: findpaths(A,B,S) and findcut(A,B,S). Both the subrou- 
tines can be based on any standard flow algorithm [4] for edge-disjoint paths 
which in each iteration constructs an auxiliary graph and either enlarge the set 
of edge-disjoint paths or flnds an edge cut corresponding to the paths. Each such 
iteration runs in time 0(n) because G is cubic and hence its number of edges 
is 0(n). The subroutine f indpaths(A,B,S) flnds the largest number of edge- 
disjoint paths between the vertex disjoint subgraphs A and B provided a set S 
of some edge-disjoint paths between the subgraphs A and B (the flow algorithm 
is applied to the graph G with each of A and B contracted to a single vertex) . 
The running time of this subroutine is 0((k -I- 1 — ko)n) where k is the number 
of edge-disjoint paths between A and B, ko is the number of paths of S and 
n is the number of vertices of G. This is because there have to be performed 
fc-|-l — fco iterations of the flow algorithm. The subroutine f indcut(A,B,S) flnds 
an edge cut between the vertex disjoint subgraphs A and i? if S' is a set of the 
largest number of edge-disjoint paths between A and B. The running time of 
this subroutine is bounded by 0{n). 
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5 An 0(n^ log^ n)-Algorithm 

We first state that if the girth or the order of a cubic graph is sufficiently large, 
then one can construct a large number of edge-disjoint subgraphs of a certain 
type in an algorithmic way: 

Lemma 5. Let G be a cubic graph of order at least 243 and girth at least five. 
Then G contains 12 edge-disjoint full trees of depth two. Moreover, 12 such trees 
can be found in time 0{n) where n is the number of vertices of G. 

Proof. Since the girth of G is at least five, the BFS-graph of depth two rooted 
at any vertex of G must be a full tree (of depth two). We find 12 vertices of G 
such that the distance between any pair of them is at least four. The BFS-trees 
of depth two rooted at such vertices are edge-disjoint and as argued above they 
are actually full trees of depth two. 

Take an arbitrary vertex of G and mark this vertex together with all the 
vertices at distance at most three from it. Then take an unmarked vertex, mark 
it and mark also all the vertices at distances at most three from it. At each step, 
at most 1-I-3-I-6-I-12 = 22 vertices get marked. Since G has at least 243 > 22-11 
vertices, we definitely find at least 12 such vertices. The just described greedy 
algorithm can be easily implemented in time 0(n). 



Lemma 6. Let G be a cubic graph of girth at least g, g > 13. Then G contains 
at least g edge-disjoint binary trees of depth |"log 2 . Moreover, g such trees 
can be found in time 0(n) where n is the number of vertices of G. 

Proof. Let d = |"log 2 and D = . Consider any edge e of the graph 

G. Consider the BFA-graph rooted at e of the depth D. Due to the girth as- 
sumption, the vertices of the levels 0, . . . , D — 1 induce a tree in G. Consider 
the following binary trees: Let I = \_D/d\. Take two binary trees of the depth 
d rooted at the two vertices of the level 0 ; these trees contain vertices of levels 
0,. . . ,d. Take 2'^+^ binary trees of the depth d rooted at 2'^+^ vertices of the 
level d; these trees contain vertices of levels d, . . . , 2d. Proceed in this manner 
upto the trees rooted at vertices of the level d{l — 1). All the constructed trees 
are edge-disjoint. Their number is equal to 2 -|- 2^^+^ -I- ... -I- = 2^j^. 

We check that 2 ^^ > g if g > 13. If 13 < g < 17, then d = 3, Z = 2 and 

hence the number of the trees is 2 ^3^]; = 18. If 18 < g < 24, then d = 4, Z = 2 
and hence the number of the trees is 2 ^4~( = 34. Finally, if 25 < g < 26, then 

d = 4, Z = 3 and hence the number of the trees is 2^4^f = 546. We proceed as 
follows for the remaining values of g: 



2di_i 2^-d-l 2^+V(g+l)-l 

2 • — ^ > 2 • = 2 • > 2 • — 

2d-l - 5-1-1 5-2 - 5-2 



2D+i - g-1 29/2 _ g _ 1 

2 . - > 2 • - 

52 - 5-2 - g^-g-2 
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The obtained expression is greater than g if g > 27. 

The trees can be clearly found algorithmically in time 0{n). Simply take 
any edge e of G, run the BFS routine and output g binary trees of depth d 
constructed in the above way. 

We are now ready to present our algorithm running in time 0{n^ log^ n): 

Theorem 3. There is an algorithm for computing the cyclic edge connectivity 
of cubic graphs running in time O(n^log^n). 

Proof. Let us briefly describe the algorithm: If the number of vertices of the input 
cubic graph G is smaller than 243, we run the algorithm from Theorem 2. In the 
other case, we start by computing the girth g of G. This can be straightforwardly 
done in time O(n^) by running a BFS routine from each of the vertices of G. 
We create a set Aq of g edge-disjoint subgraphs of G: 

— If 2 < g < 4, then Aq consists of any g different edges of G. 

— If 5 < g < 12, then Aq consists of g edge-disjoint full trees of depth 2. Such 

a set Aq can be constructed in time 0{n) due to Lemma 5. 

— If 13 < g, then Aq consists of g edge-disjoint binary trees of depth |"log 2 . 

Such a set Aq can be constructed in time 0(n) due to Lemma 6. 

Note that the activity of any subgraph contained in Aq is at least g. 

Our algorithm computes a minimum edge separation for all pairs of a sub- 
graph A G Aq and a full tree rooted sdvGV (G) of depth d for 0 < d < |"log 2 fc/3] 
for k which is the size of the cyclic edge cut found so far (initially k = g). As 
in the algorithm from Theorem 2, we use a simple flow algorithm to compute 
edge-disjoint paths between a full tree and a subgraph A. We also use the paths 
between the full tree of depth d and A as an initial set of paths between the full 
tree of depth d-l- 1 and A. If the found edge cut is smaller than 3 • 2‘^ (the activity 
of the full tree of depth d) and than g, the edge cut is cyclic (by Lemma 2) and 
if it is smaller than the cyclic edge cut found so far, we have a new upper bound 
on the cyclic edge cut. 

The number of iterations for a fixed pair of a subgraph A and a vertex v is 
0{g) = O(logn) for all the full trees rooted at v together. Each iteration takes 
time 0{n). The number of subgraphs in Ag is g = O(logn) and thus the running 
time of the whole algorithm is 0{n^ log^ n). 

We prove the correctness of our algorithm: Let k be the size of the smallest 
cyclic edge cut of G. If k = 0, then G is disconnected and the algorithm clearly 
finds the empty cyclic edge cut. If k = g, then the cyclic edge cut of size g is 
found in the first phase of the algorithm. If k < g, then there is a cyclic edge 
cut {B, G) such that both G[B] and G[G] contain full trees of depth |"log 2 
due to Theorem 1. Besides this, one of the graphs G[B] and G[G] contains a 
subgraph A G Ag because one of the subgraphs of Ag does not contain an edge 
of the cut (the subgraphs of Ag are edge-disjoint, their number is g and the size 
of the cyclic edge cut is k < g). Assume that G[B] does. Now, G[B] contains a 
subgraph A from the set Ag and G[G] contains a full tree of depth |"log 2 . 
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At the step when we consider the pair consisting of the subgraph A and a full 
tree of depth |"log 2 in G[C], we found a cyclic edge cut of size k (we found 
an edge cut of size at most k and such an edge cut is cyclic by Lemma 2). 

6 Future Research 

In this section, we briefly discuss two directions for possible future research. 
We have already obtained in both these directions partial results, but we keep 
working on improving both of the obtained results. 



Graphs of the Bounded Degree 

Our algorithms can be extended to the case of graphs with the maximum vertex 
degree D preserving the running times 0(jA log n) and O(n^log^n). However, 
in this case, the running times involve huge multiplicative constants which make 
both the algorithms impractical. We sketch the main ideas: First, remove vertices 
of degree zero and one and suppress vertices of degree two (this does not change 
the cyclic edge connectivity). Since the minimum degree of a graph is at least 
three now, its girth g is at most O(logn) and its cyclic edge connectivity is at 
most Dg. Consider a cyclic edge cut {A, B) of size k. Let us call cut vertices the 
vertices incident with the edges of the cut. Either A contains at most * = 

/ji+iog 2 D vertices or it contains a vertex whose distance from each cut-vertex is 
at least log 2 k (the proof of this fact proceeds in a way analogous to the proof 
of Lemma 4). As in Theorem 1, if fc < Dg, we may conclude in the former of 
the two cases that the girth of A is at most 0(log = 0{logDlogDg) 
which is impossible unless g < 0(log^ D). 

Based on the discussion of the previous paragraph, we may conclude: Either 
the cyclic edge connectivity of an input graph is bounded by 0(log^ D) or there 
is a smallest cyclic edge cut of size k such that each part contains a BFS-tree of 
depth at least log 2 k (this is an analogue of a full tree). Hence, it is enough to try 
to separate BFS-trees of depth at most 0{log2{Dg)) and subgraphs of activity 
at most 0(log^ D). The number of such pairs is at most O(n^) where the hidden 
multiplicative constant is of order The main goal of Theorem 1 is 

to reduce the number of subgraphs of bounded activity which are necessary to 
check to one, namely to a cycle which actually makes both our algorithms for 
cubic graphs practical. 

Of course, if the order of the input graph is large enough, it is possible to 
find a sufficient number of large edge-disjoint subgraphs similarly as in Lemmas 5 
and 6 and decrease the number of pairs to separate to O(nlogn) which gives an 
algorithm running in time 0{u? log^ n). 



Arbitrary Graphs 

We sketch an 0(n^^)-time algorithm for arbitrary graphs: A modified BFS- 
routine is used to find a cycle with the smallest activity (let a be this activity) . If 
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the activity of such a cycle is not an upper bound on the cyclic edge connectivity, 
the input graph is one of the several special types which are handled separately. 
In the general case, the algorithm ranges through all paths with activity approx- 
imately a (there are at most O(n^) pairs of such paths) and it finds a separation 
for each such pair. The smallest edge separation is the sought cyclic edge cut. 
The running time can be further improved by showing that it is enough to range 
only through pairs of a certain special type as pointed out to us by Jiff Sgall. 

Acknowledgement. This research was started when the authors took part in 
the DIMACS/DIMATIA Research Experience for Undergraduates programme 
in 2001 under the supervision of Jeff Kahn from Rutgers University. This pro- 
gramme is a joint project of DIMACS at Rutgers University and DIMATIA at 
Charles University. We temporarily stopped working on this problem when the 
programme had been finished and we restarted our work at the end of the year 
2002 . 

The authors thank all the anonymous referees for their comments which 
helped to improve the style of this paper and the clarity of our arguments. 



References 

1. R. E. L. Aldred, D. A. Holton, B. Jackson: Uniform cyclic edge connectivity in 
cubic graphs, Combinatorica 11 (1991), 81-96. 

2. L. D. Andersen, H. Fleischner, B. Jackson: Removable edges in cyclically 4-edge- 
connected cubic graphs. Graphs Comb. 4(1) (1988), 1-21. 

3. R. L. Caret, K. J. Dennistion, J. J. Topping: Principles and applications of inor- 
ganic, organic & biological chemistry, Wm. C. Brown Publishers, Boston (1997). 

4. T. H. Cormen, C. E. Leiserson, R. L. Rivest: Introduction to Algorithms, MIT 
Press, Boston (1990). 

5. H. Fleischner, B. Jackson: A note concerning some conjectures on cyclically 4-edge 
connected 3-regular graphs, Ann. Disc. Math. 41 (1989), 171-177. 

6. D. Lou, L. Teng, X. Wu: A polynomial algorithm for cyclic edge connectivity of 
cubic graphs, Australasian J. Comb. 24 (2001), 247-259. 

7. D. Lou: private communication. 

8. W. McCuaig: Cycles through edges in cyclically fc-connected cubic graphs. Discrete 
Math. 103 (1) (1992), 95-98. 

9. W. McCuaig: Edge reductions in cyclically fc-connected cubic graphs, J. Comb. 
Theory Ser. B 56 (1) (1992), 16-44. 

10. R. Nedela, M. Skoviera: Atoms of cyclic connectivity in cubic graphs. Math. Slovace 
45 (1995), 481-499. 

11. R. Nedela, M. Skoviera: Decompositions and reductions of snarks, J. Graph Theory 
22 (1996), 253-279. 

12. B. Peroche: On Several Sorts of Connectivity, Discrete Mathematics 46 (1983), 
267-277. 

13. M. D. Plummer: On the cyclic connectivity of planar graphs, in: Graph Theory 
and Applications, Springer- Verlag, Berlin (1972), 235-242. 

14. P. G. Tait, Remarks on the colouring of maps, Proc. Roy. Soc. Edingburg 10 
(1880), 501-503. 




Subexponential-Time Framework for Optimal 
Embeddings of Graphs in Integer Lattices 



Anders Dessmark, Andrzej Lingas, and Eva-Marta Lundell 

Department of Computer Science, Lund University, Box 118, 221 00 Lund, Sweden. 
{Anders .Dessmark, Andrzej .Lingas , Eva-Marta.Lundell}@cs . 1th. se . 



Abstract. We present a general framework for computing various opti- 
mal embeddings of undirected and directed connected graphs in two and 
multi-dimensional integer lattices in time sub-exponential either in the 
minimum number n of lattice points used by such optimal embeddings 
or in the budget upper bound b on the number of lattice points that 
may be used in an embedding. The sub-exponential upper bounds in the 
two dimensional case and d-dimensional case are respectively of the form 

20(\/hflogn) 2d*(\d6 log 6) log ri) log b) 

where I stands for the degree of the allowed overlap. For the problem of 
minimum total edge length planar or multi-dimensional embedding or 
layout of a graph and the problem of an optimal protein folding in the so 
called HP model we obtain the upper bounds in terms of n. Note that in 
case of protein folding n is also the size of the input. The list of problems 
for which we can derive the upper bounds in terms of b includes among 
other things: 

1. a minimum area planar embedding or layout of a graph, 

2. a minimum bend planar or three dimensional embedding or layout, 

3. a minimum maximum edge length planar or three dimensional em- 
bedding or layout. 



1 Introduction 

An embedding of a graph into a multi-dimensional integer lattice is a mapping 
which maps each vertex of the graph onto a lattice point and each edge of the 
graph onto a straight-line path on the lattice. 

Problems of finding optimal embeddings of graphs in two and three dimen- 
sional integer lattices arise among other things in graph drawing [17,19], VLSI 
layout [11,23] and protein folding [6,4,16]. Such problems are typically NP-hard, 
e.g., the problems of finding minimum total edge length, minimum area or min- 
imum bend planar grid embedding of a planar graph with maximum degree at 
most four [21] (see also [8]) or the problem of optimal protein folding in the so 
called HP model [6,4,16] 

In this paper, we develop a general framework for solving exactly the opti- 
mization graph embedding problems in time subexponential in the total number 

^ In contrast, the corresponding problems for plane graphs, i.e., planar graphs with 
fixed planar embedding often admit fast algorithms [17,22] 
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of lattice points used by an optimal embedding or a budget constraint on the 
number of lattice points which may be used. Our framework relies on a very sim- 
ple balanced separator for an embedded graph which has the form of a lattice 
hyperplane. 

The simplicity of the aforementioned separator distinguishes it from such 
known balanced separators as planar graph separators [12], separators for graphs 
with excluded minor [1], separators for fc-ply systems [14] etc. We provide a 
linear-time algorithm for constructing the simple separator. In fact, during writ- 
ing our paper we found that a similar construction and proof already occurred in 
a technical report by Raghavan [20] . Since the translation of Raghavan’s result 
from its numerical context into our terms would require about the same space 
as our construction, we provide the latter. 

By using the existence of the simple separator, we design a nested-dissection 
dynamic programming framework a la Arora [2] yielding subexponential algo- 
rithms for several important NP-hard optimization graph embedding problems 
(see Section 4). 

Our subexponential upper time-bounds are expressed in terms of either the 
minimum number n of lattice points used by a corresponding optimal em- 
bedding or the budget upper bound h on the number of lattice points that 
may be used in an embedding. The bounds in the two dimensional case and 
d-dimensional case are respectively of the form and 

20 (di ^ ^ where I stands for the degree of the 
allowed overlap. For the problem of minimum total edge length planar or multi- 
dimensional embedding or layout of a graph and the problem of an optimal 
protein folding in the so called HP model we obtain the upper bounds in terms 
of n. Note that in case of protein folding n is also the size of the input. The 
list of problems for which we can derive the upper bounds in terms of b include 
among other things: 

1. a minimum area planar embedding or layout of a graph, 

2. a minimum bend planar or three dimensional embedding or layout, 

3. a minimum maximum edge length planar or three dimensional embedding 

or layout. 

Our paper is structured as follows. In the next short section, we formalize our 
model of graph embedding. Section 3 is devoted to the derivation of the hyper- 
plane balanced separators for graph embeddings in two-dimensional and multi- 
dimensional integer lattices. In Section 4, we present our general subexponential- 
time framework by designing subexponential algorithms for the problem of min- 
imum total edge length embedding in the planar and multi-dimensional case. In 
Section 5, we obtain analogous subexponential-time algorithms for the problem 
of protein folding in the HP model in the planar and three-dimensional cases. 
In Section 6, we discuss shortly analogous subexponential time algorithms for 
the problems of minimum area, minimum bend, and minimum maximum edge 
length embedding or layout. 
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2 Models of Graph Embeddings in Lattices 

An embedding with overlap I of a (directed or undirected) graph G = (V, E) into 
a lattice Z'^ is a function which maps each vertex of G onto a point in Z‘^ and 
each edge of G onto a path on the lattice such that for every lattice point the 
total number of vertices and interior of edges of G mapped on it is not greater 
than 1. The lattice points on which the vertices and edges of G are mapped on 
are called the nodes of the embedding. For convention, we shall sometimes call 
an embedding with overlap 1 a planar embedding and an embedding with overlap 
greater than 1 a layout. 

3 Separators for Graph Embeddings in Integer Lattices 

In this section, we show how to construct a balanced separator of relatively small 
size for a graph embedded in a lattice, in the form of a lattice hyperplane. We 
provide a linear-time algorithm for constructing the aforementioned separator. 
In fact, during writing our paper we found that a similar construction already 
occurred in a numerical context in a technical report on by Raghavan [20] . The 
translation of Raghavan’s result into our terms would require about the same 
space as our construction. Similar proof techniques can also be found, e.g., in 
[ 10 ]. 

Lemma 1. Suppose that an embedding E with overlap I and n nodes of a non- 
necessarily connected graph into two dimensional lattice Z^ is given. In time 
linear in n, one can find a vertical or horizontal lattice line that contains at most 
k'/ln, for any k > 2/-\/3, nodes of E and splits E into two parts (overlapping 
along the line), each containing at most 5 /6th of the nodes in E if the number 
n of nodes of E is greater than 36fc^/ ^ . 

Proof. Let Vi be the leftmost vertical lattice line such that there are totally at 
least n/6 nodes of E on the line or to the left of it. Symmetrically, let W be the 
rightmost vertical lattice line such that there are totally at least n/6 nodes of 
E on the line or to the right of it. Note that Vi and Vr are well defined and V) 
cannot lie to the right of W- Let BV be a block of consecutive vertical lattice 
lines starting from Vi and ending with Vr. 

If BV contains a line different from Vi and Vr including at most kVTn nodes 
of E we are done. 

Otherwise, each inner line in BV contains at least ky/Tn nodes. 

Then, since BV contains at most 2n/3 nodes, BV includes at most 2y^/3fc-|- 
2 vertical lines. 

Analogously, we define the block BH of horizontal lattice lines. Similarly, if 
BH contains a horizontal line different from the extreme lines including at most 
ky/ln nodes of E we are done. Otherwise, BH contains at most 2y^/3fc -I- 2 
horizontal lines. 

Suppose that neither BV nor BH contain such a separating line. Let B be 
the intersection of BV with BH . Note that B contains at least n/3 nodes of E. 
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On the other hand, since each side of B is not longer than /?>k + 2, B cannot 

contain more than ^ f + lattice points. Since + ^ 

n/3 for n > , we obtain a contradiction. 

To find the separator we first compute a bounding box for E by traversing 
E finding the extreme coordinates. First assume that the sides of the bounding 
box are of length at most n. Recall that n stands for the number of nodes in the 
embedding E which might be substantially larger than the number of vertices in 
the input graph. In this case we first find the number of nodes on each lattice line, 
by traversing the embedding updating counters for encountered horizontal and 
vertical lattice lines, this can be done in time linear in n. To find the number 
of nodes of E to the left and to the right of each line, we simply apply the 
prefix sums. We proceed similarly in the vertical direction. When the side of the 
bounding box is larger than n we first compute the median of the coordinates 
of the nodes and consider only the n lattice lines centered on the median. We 
compute the number of nodes only on these lattice lines and the extreme nodes 
(nodes with a coordinate more than n /2 smaller or larger than the median) will 
be counted together in a left and a right (or top and bottom respectively) sum. 
We compute the prefix sums and if a separator of the desired properties exists 
for the axis it clearly lies within the considered set of lattice lines. 

We conclude that we can find a desired separating line in time 0{n). □ 

Analogously, we can derive a d-dimensional counterpart of Lemma 1. 

Lemma 2. Suppose that an embedding E with overlap I of a non-necessarily 
connected graph into d-dimensional lattice is given and n is the number of 
nodes of E. In time 0{dn), one can find a lattice hyperplane parallel to one of 
the d axes that contains at most , for any k > 3^/*^ — , nodes 

of E and splits the lattice into two parts (overlapping on the hyperplane), each 
containing at most (1— ^)n nodes of E if the number n of nodes of E is greater 

]( xd 

inan H 1 ■ 

Proof. The proof is a straightforward generalization of that of Lemma 1 to in- 
clude d dimensions. Instead of lattice straight-lines we use lattice hyperplanes 
parallel to one of the d axes. We define analogously d blocks Bi, ..., Bd of hyper- 
planes respectively parallel to the axes 1 , ..., d, splitting the lattice into two parts 
such that each part (including the hyperplane) contains at most (1 — ^)n nodes 
of E. If at least one of the inner hyperplanes in the blocks contains 
nodes we are done. 

Otherwise, we obtain a contradiction by considering the intersection of the d 
blocks. Note that the intersection contains at least n/3 nodes of E. On the other 
hand, since each side of B is not longer than + 2 the intersection 

cannot contain more than — 3 I 3 ) + 2 )^ lattice points which is less than 

6/3 for n > obtain a contradiction. 

Assuming k = 2 the smallest value of n for which the lemma holds is 284 for 
d = 3 and 1154 for d = 4. 
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To find the number of nodes in each orthogonal lattice hyperplane, we proceed 
as in the two-dimensional case. Computing the number of nodes on lattice lines 
will now require 0{d) time per node and since we compute prefix sums (and 
possibly, median) separately for each of the d axes the total required time is 
0{dn). □ 



4 An Algorithm for Minimum Total Edge Length 
Embedding 

In the minimum total edge length embedding we are given a connected graph G 
and the overlap parameter 1 . Note that the total edge length of an embedding is 
equal to the difference between the total number n of nodes in the embedding, 
i.e., the total number of lattice points used, and the number of edges of the 
embedded graph. Thus, we need to find an embedding if of G in a lattice with 
overlap I achieving the minimum number n of nodes in order to minimize the 
total edge length. We shall use this problem to demonstrate our framework. 



4.1 Embedding in the Plane 

Using Lemma 1, we can recursively divide several embedding problems into sub- 
problems, while the separator set is kept small enough to make the recombination 
efficient. We shall use this to find a minimum node-cost embedding by dynamic 
programming. 



Definition 1. Let E be an embedding with overlap I of a non-necessarily con- 
nected (directed or undirected) graph in . Furthermore, fix a k > 2/-\/3. Let 
b be the number of nodes of E. A horizontal or vertical line on the lattice is 
a separating line ( with respect to E) if only if it satisfies the thesis of Lemma 
1. Let B be the bounding box of E, i.e., the smallest rectangle on includ- 
ing E. A separating partition of B is its recursive partition by separating lines 
until the total number of nodes of E within each resulting rectangle is at most 

on u2 1 (4-1-4 \/3fc-t-3fc^) 

OOft, t (3fe2_4)2 



By Lemma 1, we obtain immediately the following one. 



Lemma 3. For any embedding E with overlap I of a non-necessarily connected 
(directed or undirected) graph in Z'^ there is a separating partition of E. 



Theorem 1. Suppose that an embedding with overlap I of a connected (directed 
or undirected) graph G into the integer lattice Z"^ exists and n is the minimum 
number of lattice points used in such an embedding. A minimum total edge length 
embedding with overlap I of G can be found in time . 
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Proof. First we shall show that if an embedding with overlap I of G into under 
a budget constraint b on the number of nodes exists then a minimum total edge 
length embedding with overlap I of G can be found in time 

We may assume that the lattice is constrained to the b x b lattice. 

We shall find an optimal embedding with overlap I by dynamic programming, 
solving appropriate subproblems in a bottom up fashion. Each subproblem is 
given by: 

1. a rectangle R on the lattice, 

2. a subset P of the set of vertices of G and the set of the edge nodes (i.e., 
auxiliary vertices to be assigned to points of the lattice in order to model 
an edge in G by a path on the lattice, no edge can have more than b such 
nodes) to be located on the perimeter of R such that on each side of R there 
are at most kVlb of them ( e.g., P can be described by a sequence of pairs 
(p, q) where p is a lattice point on the perimeter of R and g is a vertex of G 
or a pair consisting of an edge of G and the number of its edge node between 
1 and b), 

3. for each vertex v oi G vci P and each neighbor w of in G information on 
which of the four directions the path modeling the edge (y, w) is expected 
to come from, similarly for each edge node u and each of its neighbors on 
the path modeling the edge information about the direction leading to the 
neighbor. 



We shall call such subproblem perimeter feasible if the number of elements 
in P located on any lattice point on the perimeter of R is at most 1. 

The number of such subproblems is calculated as follows. The number of 
rectangles is 0{b“^) and the number of possible ways to select perimeter points on 



each rectangle is O 




— 0{b'^). For each of these perimeter positions 



there is a choice of 0{b^) vertices and edge nodes, since we are unable to decide 
how the lengths of the different paths are distributed. Finally there are 4 different 
directions for each. All this amounts to subproblems. 

Such a subproblem induces a subset S of vertices of G that are to be em- 
bedded within R. The subset S can be easily found in time polynomial in n by 
respective depth first searches from the vertices of G that have to be within R 
according to the directional information on P. This relies on the connectedness 
of G. 

A solution to such a subproblem is an embedding with overlap I of the sub- 
graph of G induced by S, extended by a minimum number of edge nodes in 
P within R, compatible with the fixed position of the elements in P on the 
perimeter of R (provided that a compatible embedding is possible). 

The perimeter feasible subproblems for which the induced S set of vertices 
extended by the edge nodes in P contains at most elements 

can be easily solved in polynomial time by enumerating all possible embeddings 
with overlap 1. Larger perimeter feasible subproblems can be solved by splitting 
them into two smaller compatible subproblems by a separating line in all possible 
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ways. To find an optimal solution to the subproblem it is sufficient to take 
the minimum of the total cost of the union of the optimal solutions to the 
two subproblems over all possible partitions by a separating line. All potential 
separating lines and hence all pairs of such compatible subproblems can be easily 
found in time 

Now to complete the proof it is sufficient to apply the so called exponential 
search with respect to b, running the method with the budget constraint b and 
doubling b until a valid embedding is found. The aforementioned embedding will 
be optimal and the total time cost will be 



4.2 Embedding in a d-Dimensional Lattice 

In this section we briefly present the analogous result for embeddings in the 
d-dimensional lattice for arbitrary d. 

A separating partition in d dimensions is a straightforward generalization of 
Definition 1. Lemma 3 clearly holds for d dimensions. 

Theorem 2. Suppose that an embedding with overlap I of a connected (directed 
or undirected) graph G into the integer lattice exists. A minimum total edge 
length embedding with overlap I of G can be found in time 2*^^'^* ^ logn)^ 

Proof. Analogously as in the two-dimensional case, it is sufficient to derive the 
upper time-bound in terms of the budget constraint b on the number of nodes 
in an embedding. 

We assume that the lattice is constrained to the b'^ lattice. 

Each subproblem for our dynamic programming procedure is given by: 

1. a d-dimensional rectangle R on the lattice, 

2. a subset P of vertices of G and edge nodes located on the perimeter of R 
such that on each side of R there are at most kl^/%G-P/d q£ them, 

3. for each vertex v oi G in P and each neighbor w of in G information on 
which of the 2d directions the path modeling the edge (y, w) is expected 
to come from, similarly for each edge node u and each of its neighbors on 
the path modeling the edge information about the direction leading to the 
neighbor. 

The calculation of the number of subproblems is in the d-dimensional 
case changed to the following. The number of rectangles is 0{b^‘^) and the 
number of possible ways to select perimeter points on each rectangle is 

O For each of these perimeter po- 

sitions there is a choice of 0{b^) vertices and edge nodes. Finally there are 2d 
different directions for each. This amounts to ^ subproblems. 

Clearly time complexity of the dynamic algorithm is linear in the number of 
subproblems as in the planar case. 
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5 Protein Folding 

The problem of protein folding is to predict the three-dimensional structure of 
a protein molecule specified by a sequence of amino acids solely on the basis of 
the sequence. 

Dill proposed a simple model, the so called iJP-model of a protein in [6]. 
In this model, the amino acids are classified as being either hydrophilic (P) 
or hydrophobic (H) and then a protein is modeled as a consecutive chain of 
hydrophilic and hydrophobic elements. The folding of a protein is modeled by 
defining a bond as a pair of adjacent hydrophobic elements that are not neigh- 
bours in the chain. It is generally assumed that the folding of a protein which 
maximizes the number of such bonds in this model corresponds to the folding of 
the protein in the nature. 

The problem of protein folding in the HP model can be further simplified 
into the following problem of string folding: given a binary string, determine its 
embedding onto or such that any two consecutive elements in the string 
are mapped on neighboring lattice points and the number of pairs of adjacent 
lattice points which are images of Is is maximized. 




Fig. 1. An example of a string folding in where dark dots represent Is and light 
dots represent Os. 

The string folding problem is known to be NP-hard [4,5,16,18]. The best 
known approximation factor for this problem achievable in polynomial time is 
I [9]. Unfortunately, the usefulness of such approximations of string folding to 
model protein folding is doubtful since even small deviations from the so-called 
native structure can disrupt the functionality of a protein. Thus, it is desirable to 
produce an optimal or almost optimal string folding. Given that the length of a 
protein ranges from around fifty amino acids to typically hundreds, and upwards. 
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the exhaustive search approach is infeasible even for the smallest molecules. For 
this reason, subexponential algorithms for optimal string folding are of high 
interest. 

To adopt the problem of string folding to our framework we can view it as 
the problem of embedding with overlap 1 a path with vertices labeled by 0 or 1 
into or so that each edge of the path is mapped on a unit lattice segment 
and the number of pairs of adjacent lattice points which are the images of the 
vertices labeled by 1 is maximized. 

This maximization embedding problem can be solved analogously to that of 
minimum total edge length. Each subproblem in the planar case is specified by 
a rectangle on the lattice, a mapping of a subset of 0{^/n) vertices of the input 
path onto lattice points on the perimeter of the rectangle, for each of the vertices 
one of the four possible lattice directions for string continuation, and finally the 
information of whether the first element is to be located within the rectangle. 
Such a specification immediately induces a set of subpaths of the input path to 
be embedded within the rectangle. 

A solution to such a subproblem is an optimal planar embedding of the set 
of subpaths within the rectangle compatible with the fixed positions of their 
endpoints on the rectangle perimeter and the fixed continuation directions (pro- 
vided that a compatible embedding is possible). The optimality criterion is the 
number of vertices labeled with 1 mapped on adjacent lattice points. Special 
care must be taken in the combination step to avoid counting a bond between 
two Is on the perimeter twice. 

By straightforward calculations the number of subproblems is 
case Z^ and 2*^^" ''^logn) ^d, 

Theorem 3. An optimal folding of a binary string of length n in Z^ can he 
found in time and an optimal folding of the string in Z^ can he 

found in time . 

6 Other Problems 

Throughout this section we shall assume that there is given a budget constraint 
b on the number of nodes in the sought embedding. By simplifying the method 
given in the proof of Theorems 1, 2, we easily obtain the following lemma. 

Lemma 4. One can decide whether or not an embedding with overlap I of a con- 
nected (directed or undirected) graph on the integer lattice Z'^ under the budget 
constraint b exists in time 

Hence, we shall assume in the following subsections that such an embedding 
under the budget constraint b exists. 

6.1 Minimum Area Embedding and Layout 

The size of the bounding box of a graph embedding or layout is often called the 
area of the embedding or layout [11]. To find that of minimum area we use the 
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method analogous to that for minimum total edge length. In fact, for each of 
the subproblems we only need to find a feasible solution. The subproblem of the 
smallest rectangle area among positively solved subproblems whose perimeter is 
only touched by embedding from inside yields the optimum. 

Theorem 4. A minimum area embedding with overlap I of a connected ( directed 
or undirected) graph on the integer lattice under the budget constraint b can 
be found in time 

As in the case of the minimum total edge problem, the method for minimum 
area can be naturally generalized to include the minimum volume problem for a 
graph embedding or layout in d-dimensional integer lattice. 

6.2 Minimum Bend Embedding and Layout 

The difference between the problem of minimum total edge embedding or layout 
and that of minimum bend embedding or layout is that in the latter we count 
solely the nodes at which the lattice paths modeling edges bend. An involved 
proof of the NP-hardness of the minimum bend embedding problem for planar 
graphs of maximum degree four has been given by Storer in [ 21 ]. By modifying 
appropriately the optimization criterion in our algorithms for the problem of 
minimum total edge embedding, we easily obtain the following theorems. 

Theorem 5. A minimum bend embedding with overlap I of a connected (directed 
or undirected) graph on the integer lattice under the budget constraint b can 
be found in time 

Theorem 6. A minimum bend embedding with overlap I of a connected (directed 
or undirected) graph on the integer lattice Z'^ under the budget constraint b can 
be found in time 20(dd''‘^6<‘^ 



6.3 Minimum Maximum Edge Length Embedding or Layout 

The problem of maximum edge length embedding or layout is to find an em- 
bedding or layout of graph in an integer lattice that minimizes the length of a 
longest lattice path modeling an edge of the graph. In order to adapt our al- 
gorithm for minimum total edge length, it is not sufficient just to change the 
optimization criterion. We need also to add additional parameters, specifying 
the maximum allowed length of each path crossing the perimeter of the rect- 
angle R within R, while defining the subproblems. The details are left to the 
reader. These additional parameters do not change the asymptotic formula on 
the number of subproblems. Hence, we obtain the following theorems. 

Theorem 7. A minimum maximum embedding with overlap I of a connected 
(directed or undirected) graph on the integer lattice Z'^ under the budget con- 
straint b can be found in time 
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Theorem 8. A minimum maximum embedding with overlap I of a connected 
(directed or undirected) graph on the integer lattice under the budget con- 
straint b can be found in time 

7 Final Remarks 

Many important NP-hard problems do not admit polynomial-time approxima- 
tion algorithms with reasonable approximation factors unless P = NP [3]. Also, 
in case of some problems, e.g., protein folding, even “reasonable approximation 
factors” are not good enough. Therefore, there is a growing interest in deriving 
subexponential-time algorithms, i.e., algorithms operating in time 
for NP-hard problems. In this paper, we demonstrate that many important prob- 
lems of optimally embedding graphs in integer lattices admit subexponential- 
time algorithms, by utilizing the existence of simple balanced separators for 
graph embeddings following from the geometric nature of the lattices. 
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Abstract. In this paper, we consider the problems of generating all maximal 
(bipartite) cliques in a given (bipartite) graph G = (V,E) with n vertices and m 
edges. We propose two algorithms for enumerating all maximal cliques. One runs 
with 0{M{n j) time delay and in O(n^) space and the other runs with 0(Z\^) 
time delay and in 0(n -f m) space, where A denotes the maximum degree of G, 
M (n) denotes the time needed to multiply two n x n matrices, and the latter one 
requires 0(nm) time as a preprocessing. 

For a given bipartite graph G, we propose three algorithms for enumerating all 
maximal bipartite cliques. The first algorithm runs with 0(M(n)) time delay 
and in O(n^) space, which immediately follows from the algorithm for the non- 
bipartite case. The second one runs with O(A^) time delay and in 0(n -(- m) 
space, and the last one runs with 0{A^) time delay and in 0(n -I- m -I- NA) 
space, where N denotes the number of all maximal bipartite cliques in G and both 
algorithms require 0{nm) time as a preprocessing. 

Our algorithms improve upon all the existing algorithms, when G is either dense 
or sparse. Furthermore, computational experiments show that our algorithms for 
sparse graphs have significantly good performance for graphs which are generated 
randomly and appear in real-world problems. 



1 Introduction 

Enumerating all configurations that satisfy a given specification is a fundamental and 
well-studied problem in combinatorics (see e.g., [13]). From both theoretical and prac- 
tical points of view, it has taken on increasing importance in many scientific fields such 
as artificial intelligence [10,20], graph theory [14,19,21], operations research [16], data 
mining [2,3], web mining [15], bioinformatics, and computational linguistics. There are 
several reasons to recognize enumeration as an important subject to study (see e.g., [13]). 
Among them, we here mention the following two reasons. 

One of the reasons is that there has been beginning to study the problems whose 
objective functions and/or constraints are difficult to be defined mathematically. For 
such problems, one of the simplest way is that we first generate all the candidates 
(polynomially many candidates or as many candidates as computational resources can 
allow), and then choose one or a few from them according to a preference or plausibility 
relation which may be based on subjective intuition. For example, in data mining, the 
procedure above is usually used to find “interesting” objects, since it is difficult to define 
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the term “interesting.” Searching a webpage by keywords is another example. Search 
engines usually output the pages including all or some keywords as the candidates of 
desired webpages. 

The second reason is the recent increase in computational power. 20 years ago, the 
computational power was too poor to enumerate all the candidates in practical time. Even 
if it could be, it is hard to handle a great many candidates to be enumerated. Recently, 
we can handle over 1 million data, and such data can be enumerated in practical time 
by an efficient algorithm. Hence, enumeration has been used to solve many real-world 
problems in diverse areas. 

This paper addresses the two problems of (1) generating all maximal cliques (equiv- 
alently, all maximal independent sets or all minimal vertex covers) of a given graph and 
(2) generating all maximal bipartite cliques of a given bipartite graph. Since cliques are 
fundamental graph objects, the problem of generating all maximal cliques is regarded as 
one of the central problems in the field of enumeration, and has attracted considerable 
attention in the past (e.g., [7,14,16,21]). The problems have not only theoretical inter- 
est, but also a number of potential applications in many areas (e.g., [15,2,3]). The next 
section presents two examples for generating all maximal bipartite cliques. 

In 1977, Tsukiyama et al. [21] first proposed an output-polynomial (or polynomial 
total time) algorithm for generating all maximal cliques in a given graph G = (V,E) 
that runs with 0(nm) time delay (i.e., the computation time between any consecutive 
output is bounded by 0{nm) , and the first (resp., last) output occurs also in 0{nm) time 
after start(resp., before halt)of the algorithm) and in O(n-l-m) space. Heren = \V\ and 
m = \E\. Lawler et al. [16] generalized this result (see [9] for further generalization). 
Chiba and Nishizeki [7] reduced the time complexity to 0{a{G)m), where a{G) is 
the arboricity of G with ml{n — 1) < a{G) < Johnson et al. [14] proposed 

an algorithm which enumerates all maximal cliques in the lexicographical order. The 
algorithm runs with 0{nm) time delay, but it uses 0{nN) space, where N denotes the 
number of all maximal cliques of a given graph. 

In this paper, we propose the following two algorithms for enumerating all maximal 
cliques. The first one makes use of matrix multiplication, and runs with 0(M (n)) time 
delay and in O(n^) space, where M{n) is the time needed to multiply two n x n 
matrices. Since it is known that matrix multiplication can be done in time [8], 

our algorithm improves upon the previous algorithms for dense graphs. For example, if 
a given graph has m = edges, our algorithm dominates all the existing ones. 

The second algorithm runs with 0(Z\^) time delay and in 0(n -I- m) space, where A 
is the maximum degree of G and it additionally requires 0(nm) time as a preprocessing 
before generating the first maximal clique. This improves upon the previous algorithms 
when a given graph G is sparse, e.g., n = 17(Z\^+^) andm = J2(nZ\) forany e > O.More 
generally, we consider graphs G having 9 vertices with large degree ( > A*). We propose 
an algorithm that runs O((Z\*)^(Z\*-|-0)-|-0^) time delay and inO((n-|-fV*)0-|-TO) space, 
where N* denotes the number of all maximal cliques of the subgraph of G induced by 
vertices with large degree, and 0{nm) time is required as a preprocessing. This algorithm 
is motivated by practical applications such as web networks, since the graphs obtained 
from those applications usually have a few vertices with large degree. In this paper, we 
implement our second algorithm, and compare it with the algorithms of Tsukiyama et 
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al. by using graphs which are generated randomly and appear in real-world problems. 
We show that our algorithm is much faster than the algorithm of Tsukiyama et al. 

Listing all maximal bipartite cliques is also well-studied (see e.g., [5,11,17,18]). 
Let us first note that the generation of all maximal bipartite cliques in a bipartite graph 
G = {VxVJV 2 ,E) can be seen as the one of all maximal cliques in the graph G obtained 
from G by adding edges so that Vi and V 2 both become cliques. This implies that the 
algorithms above are applicable to generate all maximal bipartite cliques. Especially, 
our algorithm that makes use of matrix multiplication improves upon all the existing 
algorithms for dense graphs. However, if we consider practical applications, we need to 
develop algorithms for sparse bipartite graphs. Note that G might be dense, even if G is 
sparse. In this paper, we propose two algorithms for sparse bipartite graphs. The first one 
runs in 0(Z\^) time delay and in 0(n + m) space, and the second one runs with 0(Z\^) 
time delay and in 0(n + m + NA) space. Here both algorithms additionally require 
0{nm) time as a preprocessing. Similar to non-bipartite clique case, these algorithms 
improve upon previous algorithms for sparse graphs, and has good performance for 
computational experiments. 

The rest of the paper is organized as follows. In Section 2, we present some examples 
of applications for our problems. Section 3 provides some preliminaries and introduces 
notation. Section 4 explains the algorithms of Tsukiyama et al. and Johnson et al. Section 
5 presents an algorithm which uses matrix multiplication, and Sections 6 and 7 consider 
the problem for enumerating all maximal cliques and maximal bipartite cliques for sparse 
graphs, respectively. Section 8 shows some results of computational experiments. 

Due to the space limitation, some proofs are omitted. 



2 Applications of Maximal Clique Enumeration 

In this section, we present two examples of the applications of generating all maximal 
bipartite graphs. Some other applications can be found in the context of concept lattice 
[12] and in artificial intelligence, for example. 

2.1 Web Communities 

Consider a directed graph G = (V, A) (called web network) whose vertices and arcs 
correspond to web pages and their links, respectively. Kumar etal. [15] xtgMdtd directed 
bipartite cliques S 2 ) (i.e., x 52 C A) of G as communities of web pages, i.e., 
the web pages in S 2 may have similar topics and web pages in Si may have interests 
in these topics, and considered generating directed bipartite cliques of G. They first 
construct a graph G* with about 5,000,000 arcs by removing unnecessary vertices and 
arcs from G, and then enumerate all directed bipartite cliques in the reduced graph G*. 
They show that directed bipartite cliques usually contain similar topics by checking them 
by human hands. However, since G* contains a great number of bipartite cliques, they 
could enumerate only those containing at most 10 vertices. 

In this setting, it is natural to regard maximal directed bipartite cliques as good 
representatives of communities. From a directed graph G = {V, A), let us construct a 
bipartite (undirected) graph G = {ViJV , E) such that K (= {i; | w G K}) is a copy of K 
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and (v, u) € B if and only if (v, u) G A. Then there exists a one-to-one correspondence 
between directed bipartite cliques in G and bipartite cliques in G. Hence, our algorithms 
are applicable to generate all maximal directed bipartite cliques in G* 

2.2 Closed Item Sets 

Let / be a set of items and Tbe a family of sets in I (i.e., T C 2^), where T G T is called 
a transaction. For a given constant a, a subset S' of / is called an (a-)frequent set if at 
least a transactions of T include S. In data mining, we see that frequent sets characterize 
database T, and investigate the enumeration of all frequent sets to find association rules 
from T, which is one of the main topics in data mining (e.g., [2,3]). However, since a 
database contains a great number of frequent sets if a is small, many researchers started 
studying the enumeration of all closed item sets, instead of all frequent sets (e.g., [5,17, 
1 8,25]). Here a frequent set S' of / is called a closed item set, if there is no other superset 
S' of S such that S' C T for any T G T with S C T. Note that the number of closed 
item sets is usually much smaller than the one of frequent sets in database. 

Pasquier et al. [17,18] proposed algorithms based on back-tracking (and pruning 
unnecessary branches) to enumerate all closed item sets. Their experimental results show 
that, if a is large, the number of closed item sets is quite small (up to about 100,000), and 
hence the algorithms are fast. However, since the algorithms are not output-polynomial, 
they are not useful if we have a number of closed item sets, for example. 

For a set of transactions T C 2^, we construct a bipartite graph Gj- = {V\ iJV 2 ,E) 
by Hi = T, V2 = / and {u, v) G E if and only ifuGVi includes v GV^ - Zaki and M. 
Ogihara [25] showed that there exists a one-to-one correspondence between closed item 
sets of T and maximal bipartite cliques in G7-. Hence our algorithms can enumerate all 
closed item sets in polynomial time delay. Since Gj- constructed from a database T is 
usually sparse, it is shown [24] that our algorithms for sparse graphs work pretty well. 



3 Definitions and Notations 

This section introduces some notions and notations of graphs used in the subsequent 
sections. 

Let G = (y, i?) be a graph with a vertex set H = {wi , . . . ,Vn} and an edge set E = 
{ei, . . . , Cm}. Ifthere is a partition y and V 2 of H such that no two vertices in Vi,i = 1,2 
are adjacent, then G is called bipartite and denoted by G = {Vif)V 2 ,E). Throughout 
this paper, we assume without loss of generality that G is simple and connected, since 
we deal with clique generation problems. We denote by A the adjacency matrix of G, 
i.e., H is an n X n matrix such that its element = 1 if (z, j) G E, and = 0, 
otherwise. For a vertex subset S C V, x{S) denotes the characteristic vector of S, i.e., 
the zth element of x{S) is 1 if Wj G S, and 0, otherwise. 

For a vertex v of G, let E{v) = {u G V \ {u,v) G E} and (j(z;) = |L(f)|. We call 
r{v) the neighbor of v, and <j(z;) the degree of v. We denote by A the maximum degree 
of G. Similarly, for a vertex set S', let L(S') = {zz G V\S \ (u,v) G 77 for some z; G S}, 
and r{S) is called the neighbor of S. Let a{s) be the set of all z; G H \ S such that 
{v, u) G E for any zz G S. By definition, we have A{S) C E{S) (C V\S). For a vertex 
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set S and an index i, let S<i = S D {t;i, . . . , Vi}. For two vertex sets X and Y , we 
say X is lexicographically larger than Y if the smallest vertex (i.e., a vertex with the 
smallest index) in {X \ F) U (F \ X) is contained in X. 

A vertex set AT C V is called a clique if any two vertices in K are adjacent, and 
a maximal clique if no other clique contains K in addition. For a clique K, let C{K) 
denote the maximal clique that is the lexicographically largest among all maximal cliques 
containing K. It is clear that C{K) is not lexicographically smaller than K. For a bipartite 
graph G = ( Vi U V2 , A) , a vertex set K is called a bipartite clique if any vertex in AT fl Fi 
is adjacent to any vertex in AT fl V2> and maximal if no other bipartite clique contains K 
in addition. 

4 Basic Algorithms 

In this section, we explain the algorithms ofTsukiyamaet al. [21] and Johnson et al. [14]. 
We view their algorithms as the enumeration algorithms based on reverse search, where 
reverse search was introduced by Avis and Fukuda [4] to solve enumeration problems 
efficiently. Note that our presentation of their algorithms is quite different from theirs 
[21,14], which may be of independent interest. 

Let Kq denote the maximal clique that is the lexicographically largest among all 
maximal cliques. For a maximal clique AT Kq), we define a parent P{K) of K by 
C{K<i-i) such that i is the maximum index satisfying C(A'<i_i) ^ K. Such an index 
i is called the parent index, denoted by i{K). Note that they are well-defined, since 
K ^ C{K<o) holds by AT ^ Kq. Since P{K) is lexicographically larger than AT, this 
parent-child binary relation on maximal cliques is acyclic, and creates an in-tree rooted 
by ATo. 

Lemma 1. The parent-child relation constructs an in-tree rooted by Kq. □ 

We call this in-tree the enumeration tree for maximal cliques of a graph G. Both algo- 
rithms [14,21] traverse this enumeration tree. In order to traverse enumeration tree, we 
have to compute a parent and children of a given maximal clique efficiently. 

It is not difficult to see that a parent P{K) is computable from a maximal clique K 
in linear time. However, it is not so trivial to compute from AT its children. For a maximal 
clique K and an index i, we define 

x[i] = c((iL<,nr(ui))u{ui}). (1) 



Lemma 2. Let K and K' be maximal cliques in G. Then K' is a child of K if and only 
if K' = K[i] holds for some i such that 



(a) V, K. 

(b) i > i{K). 

(c) Ar[i]<j_i = AT<i n P{yi). 

(d) iL<i = c(iL<,nr(ui))<,. 



Moreover, if an index i satisfies (a) ~ (d), then i is the parent index of K[i]. 



□ 
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Since C{K) can be computed from a clique K in 0(m) time, by Lemma 2, we 
can compute all children of a given maximal clique in 0(nm) time. Therefore, we can 
traverse the enumeration tree efficiently. 

The algorithm of Tsukiyama et al. traverses the enumeration tree in a depth-first 
manner. Their algorithm starts with a root K^, and find its children recursively. It is not 
difficult to see that the algorithm requires 0{nm) time delay and 0(n + m) space. 

The algorithm of Johnson et al. enumerates all maximal cliques in the lexicographi- 
cally decreasing order. Their algorithm initializes a queue Q as Q = {Kq}, iteratively 
extracts the lexicographically largest element K from Q and inserts into Q all the chil- 
dren which are lexicographically smaller than K. The time complexity of their algorithm 
is same as the algorithm of Tsukiyama et al., however, it needs 0(niV + m) space, where 
N denotes the number of all maximal cliques. 

5 Using Matrix Multiplication 

In this section, we describe an algorithm that runs with 0(M(n)) time delay and in 
0(n + m) space, where M (n) denotes the time needed to multiply two n x n matrices. 
The algorithm uses matrix multiplication to find all children of a maximal clique when 
we traverse the enumeration tree. 

Let us start restating conditions (c) and (d) in Lemma 2. 

Lemma 3. Let K be a maximal clique in G. Then an index i satisfies (c) if and only if 
no index j satisfies the following three conditions. 

(c-1) j < i. 

(c-2) Vj ^ K<i n r{vi). 

(c-3) Vj is adjacent to all vertices in K<a fl L(vi) U {vi}. □ 



Lemma 4. Let K be a maximal clique in G. Then an index i satisfies (d) if and only if 
no index j satisfies the following four conditions. 

(d-1) j < i. 

(d-2) Vj ^ K. 

(d-3) Vj is adjacent to all vertices in K<j. 

(d-4) Vj is adjacent to all vertices in K<i fl r(vi). □ 

Let us now consider computing all indices i such that K[i] is a child of K. We denote 
by la. I'd, Ic, and /(J sets of the indices that satisfy conditions (a) ~ (d) in Lemma 2, 
respectively. It is clear that /a can be constructed from K in 0(n) time and 0(n) space. 
Since i{K) can be computed in 0(n + m) time, /b can be constructed in 0(n + m) time 
and 0(n + m) space. From Lemma 3, we can compute Ic as follows. 

Forf = 1,2,3, letQ(c-r) be an n x n matrix whose (i, j) element is 1 if i and j satisfy 
(c-f) in Lemma 3; otherwise, 0. Then it is clear that <5(c_i) and Q(c- 2 ) can be computed 
in time and O(n^) space. However, we need 0{nf) time to compute Q[c-a) if a 

naive method is applied. In order to compute Q(c-z) efficiently, let Q be the matrix whose 
zthrow is a;((Ff<i nL(r!j)) U{wi}), where x{S) denotes the characteristic vector of a set 
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S C F. Then the (i,j) element ofQ'^is the inner product of a;((i^<inr'('yi))U{fi}) 
and x{r{vj)), where we recall that A denotes the adjacency matrix of G, and hence 
it is n r{vi)) U {fi}) n r{vj)\. We can see that the {i,j) element is equal to 

|(iT<i n r{vi)) U {fill if and only if Vj satisfies condition (c-3) in Lemma 3. Thus 
Q(c-3) can be obtained in 0(M(n)) time and O(n^) space by computing Q ■ A. This 
implies that can be constructed in 0(M(n)) time and O(n^) space. 

Similarly, 1 ^ can be constructed in 0(M(n)) time and O(n^) space. 

Therefore we have the following lemma. 

Lemma 5. Let K be a maximal clique of a graph G, and let I denote the set of all 
indices i such that K[i] is a child of K, i.e., I = /a H /b H /c H Id- Then I can be 
computed in 0(M(n)) time and 0{vf) space. □ 

We are now ready to describe our algorithm formally. 

Algorithm AllMaxCliques 
Input: A graph G = {V, E). 

Output: All maximal cliques of G. 

Step 1. Compute the lexicographically largest maximal clique Kq of G. 

Step 2. Call AllChildren (if o) and halt. □ 

Procedure ALLCHiLDRENjif) /* is a maximal clique in G. */ 

Step 1. Output K and compute the set I of all indices i such that K\i\ is a child of K. 

Step 2. For each i € / do 

Compute K[i] and call ALLCfflLDREN(A'[i]). 

end. 

Step 3. Return. □ 



Theorem 1. For a given graph G = (V, E), we can generate all maximal cliques of G 
with 0{M{n)) time delay and in 0{rf) space. □ 

6 Algorithms for Sparse Graphs 

In many practical applications, the given graphs G are sparse and only a few vertices 
have large degree. Such examples can be found in web networks [1]. In such cases, 
I2(n) time delay is not efficient enough. We first consider the simplest case in which all 
vertices have small degree, i.e., A is small. We develop an algorithm for generating all 
maximal cliques with 0(Z\^) time delay and in 0(n + m) space, where 0{nm) time is 
required as a preprocessing. 

Since \K\ < A + 1 holds for any clique K, given a clique K, we can compute G (K) 
in 0(Z\^) and 0{n + m) space by repeatedly augmenting K. Therefore, we can compute 
K[i] and P{K) in 0(Z\^) time and 0(n + m) space. 

The following lemma shows that any maximal clique K Kg) has at most 
children. 
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Lemma 6. For a maximal clique Kq), let K' be a child of K. Then Vi(^K') G 
^{^<i(K')) holds. □ 

Note that Kq in general has I7(n) children, and hence we compute them in 0(nm) 
time as a preprocessing. 

Let us now describe an algorithm that runs with 0(Z\^) time delay and in 0(n + m) 
space. The algorithm is similar to AllMaxCliques in Section 5, but different in the 
following two points. 

First, we do not construct I in Step 1 of Procedure AllChildren. If we store I in 
the algorithm, we require O(n^) space in general, since we need 0(n) space for each I 
and the depth of the recursion is 0(n). Instead, we check if K\i] is a child of K in the 
lexicographic order of i’s, and store the current i. This reduces the space to 0{n). 

Second, we do not always output a maximal clique K before recursively calling 
Procedure AllChildren. From Lemma 6, Step 1 of Procedure AllChildren checks 
at most indices i, if K ^ Kq. Since each check can be performed in 0(Z\^) time, 
AllChildren requires 0(Z\^) time without considering its recursive calls. Thus, if we 
do not modify the algorithm, it runs 0(nZ\^) time delay, since the depth of the recursion 
is 0(n). To reduce the time complexity. Procedure AllChildren outputs K before all 
its recursive calls, if the depth of the current recursion is odd; output K after all its 
recursive calls, otherwise. Although we skip the details, due to the space limitation (see 
[23] for more details), this reduces the delay to ©(Z)"*). 

Theorem 2. For a given graph G = (V, E), all maximal cliques ofG can be generated 
with 0(Z\^) time delay and in 0{n + m) space, where 0(nm) time is required as a 
preprocessing. □ 

We next consider a more general case. Let G = {V = {i;i, . . . , Vn}, E) be a graph 
such that S(vi) < A* (<C A) holds for i = 1, ... ,n — 0. Namely, only 9 vertices 
in G have large degree (> A*). Let V* = {vn- 9 +i, • ■ • , Vn} and G[C*] denotes the 
subgraph of G induced by V*. We divide the family T of all maximal cliques into two 
subfamilies T\ and where T\ has all maximal cliques that are contained in V* and 
T 2 = E \ T\. Our algorithm hrst generates all maximal cliques in the graph G[C*] 
and keeps them in the memory. This can be done in 0{6^N*) time, by preparing the 
adjacency matrix of G[C*] as a preprocessing, where N* denotes the number of all 
maximal cliques in G[C*]. Note that this generates all maximal cliques in Ei, but may 
generate some non-maximal cliques of G. Therefore, we remove them after generating 
all maximal cliques in E^. We remark that each non-maximal clique of G in E\ is 
contained in a maximal clique in E^, but no maximal clique in E^ contains more than 
one maximal clique in E\ . 

Formally our algorithm can be described as follows. 

Algorithm AllMaxCliques* 

Input: A graph G = {V, E) such that the degree of Ui (i = 1, . . . , n — 0) is at most A*. 
Output: All maximal cliques of G. 

Step 1. Generates all maximal cliques in the graph G[F*| and store them in Q. 

Step 2. Compute the lexicographically largest maximal clique Ko of G. 

Step 3. Call AllChildren (A o), output all sets in Q, and halt. □ 
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Procedure ALLCHiLDREN*(7f) /* /C is a maximal clique of G contained in T 2 - */ 

Step 1. if K contains a clique K' in Q then remove K' from Q. 

/* K contains at most one clique in Q, which is not a maximal clique of G. */ 

Step 2. Output K, compute I = {i \ Vi £ r{K<i)}, and let /* := 0. 

/* I is the set of candidates i such that K[i\ is a child of K. */ 

Step 3. For each i € / do 

if(A'<inr(wi))u{ui} g y* 

then begin check conditions (a) ~ (d) in Lemma 2. 
if they are satisfied then I* := I* U {i}. 

end. 

Step 4. For each i e /* do 

Compute K[i] and call ALLCfflLDREN(A'[i]). 

/* Note that K[i] g V* by {K<i O r(i>i)) U {ui} g V* .*! 

Step 5. Return. □ 

Let us show the correctness of the algorithm via a series of lemmas. 

Lemma 7. Let K baa maximal clique that is contained in V*. Then any descendant of 
K is contained in V* . □ 

From this lemma, (i.e., the set of all maximal cliques containing a vertex in 
y \ f/*) forms a connected component of the enumeration tree that contains Kq. When 
we generate all maximal cliques in IF 2 , we need not to traverse any descendant of a 
maximal clique K contained in V* . Therefore, Step 4 of Procedure AllChildren* 
checks if K[i] g V* before going to the recursion. 

The next lemma, together with Lemma 6 shows that the number of candidates i such 
that K[i] is a child of K (g Kq) is small (see I in Step 2 of AllChildren*). 

Lemma 8. Let AT (g Kq) ba a maximal clique that contains a vertex in C \ C*. Then 

we have |{i | Vi € L(AT<g}| < (A* + 1)(Z\* + 0). □ 

Let us then consider constructing K[i] from K and i in Step 4 of AllChildren*. 

Lemma 9. Let K be a clique including a vertex v G V \ V*. Then C{K) can be 
computed in 0((Z\*)^) time and 0{n9) space. □ 

From this lemma, if (AT<i fl L(i;g) U {vi} g V*, then we only need 0((Z\*)^) time 
to construct K[i] and to check conditions (a) ~ (d) in Lemma 2. However, we note that 
fi{n) time is needed to construct K\i] if (AT<i fl r{vi)) U {vi} g V*. The following 
lemma overcomes such difficulty. 

Lemma 10. Let K be a maximal clique in G. If{K<i n L(vi)) U {wj} g V*, then K[i] 

is either not a child of K or a maximal clique contained in V*. □ 

Base on this lemma. Step 3 of AllChildren* checks if (AT<i nC(i;j)) U{t;i} g V* 
before checking the conditions in Lemma 2. 

We are now ready to present our theorem. 
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Theorem 3. Let G be a a graph with n — 9 vertices of degree at most A*. Then all 
maximal cliques ofG can be enumerated with amortized 0((Z\*)^(Z\* + 0) + 9^) time 
delay and in 0((n + N*)9 + m) space, where N* denotes the number of all maximal 
cliques in and 0{nm) time is required as a preprocessing. □ 

We remark that 9 is small in practical cases. For example, we have 9 < log n in web 
networks, where it is called power law [1,15]. Therefore, the memory required in the 
application is not so large. 

7 Enumeration of All Maximal Bipartite Cliques 

In this section we consider enumerating maximal bipartite cliques in a bipartite graph. 

For a bipartite graph G = (Vi U V 2 , E), let Vi = {wi, ... ,Vm} and V 2 = {um+i, 

. . . , Vn}. We assume without loss of generality that no vertex v satisfies E{v) = Vi or 
¥ 2 - Recall that the generation of all maximal bipartite cliques in G can be regarded as 
the one of all maximal cliques in the graph G obtained from G by adding edges so that 
Vi and V 2 both become cliques. We denote by E and C as E and G for G; e.g., for a 
vertex v, E(v) = E{v) U Vi if f G V\, and E{v) = E{v) U V 2 if ri G V 2 . We frequently 
use E and G instead of E and G to follow the results obtained in the previous sections. 
For example, we define K[i] by 

K[i] = G((iG<, n E{vi)) U (2) 

Before describing our algorithms, let us present several good properties for bipartite 
graphs to reduce the complexity of our problem. 

Lemma 11. Let K ff Kf) be a maximal bipartite clique in G. If i > i{K), then we 
have 

(i) K[i] can be represented as 

K\i] = (KC^ E{vS) U (a{K n E{vi))) . (3) 

(ii) K[i]<i-i = K<i n r(vi) is equivalent to (c’) A{K fl E{vi)) — K = %. 

(iii) K<i = G{K<i n E{vi))<i is always satisfied. □ 

From Lemmas 2 and 1 1 , we can reduce the delay to O ( Z\^ ) time for maximal bipartite 
cliques. 

Theorem 4. Let G be a bipartite graph. Then all maximal bipartite cliques can be 
generated with 0{A^) time delay and in 0{n + m) space, where 0{nm) time is required 
as a preprocessing. □ 

Moreover, the delay can be improved, if we use additional space. 

Theorem 5. Let G be a bipartite graph. Then all maximal bipartite cliques can be 
generated with 0{A^) time delay and in 0(n + m + N A) space, where N denotes 
the number of all maximal bipartite cliques in G and 0(nm) time is required as a 
preprocessing. □ 
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8 Computational Experiments 



To evaluate the performance of our algorithms, we implement our algorithms for sparse 
graphs in Theorems 2 and 4. We also implement the algorithm of Tsukiyama et ah, and 
adapt it for bipartite graphs. Our codes are written in C, and the programs run in a PC 
of Pentium III 500MHz with 256MB memory, whose OS is Linux. We examine these 
algorithm by using graphs that are generated randomly and taken from word data of 
newspapers in computational linguistics. Their experimental results can be found in the 
table below. 

Our random graphs are generated as follows. For given r and n, we construct a graph 
with n vertices such that anduj is adjacent with probability 1/2 if i+n—j (modn) < r 
or j+n—i (mod n) < r. Bipartite graphs are constructed similarly, where we have | Vi | = 

I V 2 1 • We examine the cases of r = 10, 30 andn = 1000, 2000, 4000, . . . , 256000. Exp. 1 
and 2 (resp., Exp. 3) represent the results for generating all maximal cliques, (resp., 
all maximal bipartite cliques). Exp. 1 (resp. Exp. 3) shows the computational time to 
generate 10000 maximal cliques (resp., maximal bipartite cliques), as well as the number 
of all maximal cliques (resp., all maximal bipartite cliques), where the computational 
time in the table is expressed in seconds, and we only output the first 10000 cliques, if 
the computational time is over 3 hours. Exp. 2 shows the the computational time of our 
algorithm per a maximal clique. We also construct graphs G such that a few vertices 
of G have large degree, by adding 40 vertices and edges adjacent to such vertices with 
probability 1/2 to graphs that are generated randomly for r = 10. Similarly to Exp. 2, 
Exp. 4 shows the the computational time of our algorithm per a maximal clique. Finally, 
we examine our algorithm for real data PI , P2 and P3 which are taken from computational 
linguistics. The result is shown in Exp. 5. 



Exp. 1: maximal cliques, r — 10, 30 



# vertices 




1000 


2000 


4000 


8000 


10000 


16000 


1 32000 


64000 


128000 


256000 


Tsukiyama 


r ^ 10 


378 


761 


1410 


3564 


5123 












Tsukiyama 


r ^ 30 


1755 


4478 


9912 


21085 


25345 












Ours 


r ^ 10 


0.64 


0.65 


0.72 


0.73 


0.72 


0.74 


0.75 


0.81 


0.82 


0.82 


Ours 


r ^ 30 


4.41 


4.44 


4.47 


4.56 


4.51 


4.54 


4.55 


4.91 


4.88 


4.88 


# output 


r ^ 10 


2774 


5553 


11058 


22133 


27624 


44398 


89120 


179012 


357657 


716978 


# output 


r = 30 


20571 


41394 


83146 


168049 


209594 


336870 


675229 


1352210 


2711564 


5411519 



Exp. 2: maximal cliques , # vertices = 10000 



r 


10 


20 


40 


80 


120 


160 


240 


320 


480 


640 


Ours 


0.23 


0.31 


0.51 


1 


1.7 


2.4 


4.1 


5.7 


9.8 


14 



Exp. 3: maximal bipartite cliques, r = 10, 30 



# vertices 




1000 


2000 


4000 


8000 


10000 


16000 


32000 


64000 


128000 


256000 


Tsukiyama 


r ^ 10 


104 


214 


446 


966 


1260 












Tsukiyama 


r ^ 30 


282 


582 


1190 


2455 


3100 












Ours 


r ^ 10 


0.33 


0.32 


0.3 


0.3 


0.27 


0.3 


0.3 


0.34 


0.34 


0.35 


Ours 


r ^ 30 


1.08 


1.08 


1.09 


1.1 


1.09 


1.11 


1.12 


1.22 


1.22 


1.26 


# output 


r ^ 10 


2085 


4126 


8316 


16609 


20862 


33586 


67143 


134911 


270770 


541035 


# output 


r = 30 


9136 


18488 


40427 


68597 


101697 


165561 


322149 


625385 


1233989 


8351277 



Exp. 4: including 40 vertices with large degree, r = 10 



# vertices 


1000 


2000 


4000 


8000 


10000 


16000 


32000 


64000 


128000 


256000 


Ours 
# output 


1.07 

9136 


1.14 

18488 


1.12 

40427 


1.31 

68597 


1.21 

101697 


1.36 

165561 


1.74 

322149 


2.62 

625385 


4.02 

1233989 


7.8 

2307135 
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Exp. 5: Real world data 





# veitices(Vi , V 2 ) 


# edges 


# max cliques 


time 


PI 


22677,18484 


247003 


2700737 


291 


P2 


33347,32757 


233450 


1892469 


255 


P3 


20433,4297 


127713 


11860169 


974 



From the results in Exp. 1 and 3, we can see that our algorithms are much faster than 
the algorithm of Tsukiyama et al. The computational time of the algorithm of Tsukiyama 
et al. is linear to the number of vertices, but the one of our algorithm does not depend 
the number of vertices, since the maximum degree is small. From Exp. 2, we can see 
that the computational time of our algorithm per a maximal clique is close to 0(Z\), 
which is almost linear in the output size. Exp. 4 shows that the computational time does 
not increase so much, even if the graphs contain some vertices of large degree. Exp. 5 
shows that problems PI, P2 and P3 can be solved efficiently. We note that the algorithm 
of Tsukiyama et al. did not terminate for these problems by 3 hours. 
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Abstract. In this paper, we define and study a natural generalization of 
the multicut and multiway cut problems: the minimum multi-multiway 
cut problem. The input to the problem is a weighted undirected graph 
G = (V,E) and k sets Si,S 2 ,--- ,Sk of vertices. The goal is to find 
a subset of edges of minimum total weight whose removal completely 
disconnects each one of the sets 51,52,... ,Sk, i.e., disconnects every 
pair of vertices u and v such that u,v G Si, for some i. This problem 
generalizes both the multicut problem, when |5il =2, for 1 < i < k, and 
the multiway cut problem, when fc = 1. 

We present an approximation algorithm for the multi-multiway cut prob- 
lem with an approximation ratio which matches that obtained by Garg, 
Vazirani, and Yannakakis [GVY96] on the standard multicut problem. 
Namely, our algorithm has an 0(log2fc) approximation ratio. More- 
over, we consider instances of the minimum multi-multiway cut prob- 
lem which are known to have an optimal solution of light weight. We 
show that our algorithm has an approximation ratio substantially bet- 
ter than 0(log2fc) when restricted to such “light” instances. Specihcally, 
we obtain an 0(log LP)-approximation algorithm for the problem, when 
all edge weights are at least 1, where LP is the value of a natural LP- 
relaxation of the problem. The latter improves the O (log LP log log LP) 
approximation ratio for the minimum multicut problem (implied by the 
work of Seymour [Sey95] and Even et al. [ENSS98]). 



1 Introduction 

The input to the minimum multicut problem is an undirected graph G = (V,E) 
with a weight (or cost) function w : E ^ defined on its edges, and a col- 
lection (si,ti), . . . , (sk,tk) of vertex pairs. The objective is to find a subset of 
edges of minimum total weight whose removal disconnects Si from ti, for every 
\ < i < k. The problem is known to be APX-hard ([DJP+94]). An O(logfc)- 
approximation algorithm for the problem was obtained by Garg, Vazirani and 
Yannakakis [GVY96]. 

* Work supported in part by THE ISRAEL SCIENCE FOUNDATION founded by 
The Israel Academy of Sciences and Humanities. 

** Part of this work was done while visiting the School of Computer Science, Tel Aviv 
University, Tel Aviv 69978, Israel. 



T. Hagerup and J. Katajainen (Eds.): SWAT 2004, LNCS 3111, pp. 273—284, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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The minimum multiway cut problem is a subproblem of the minimum mul- 
ticut problem. The input consists of a weighted undirected graph G = {V, E), as 
in the multicut problem, and a set {ti, , tfe} of vertices. The goal is to find 
a subset of edges of minimum total weight whose removal disconnects ti from tj , 
for every 1 < i < j < fc. The problem is also known to be APX-hard ([DJP+94]). 
A (| — |;) -approximation algorithm for the problem was obtained by Calinescu, 
Karloff and Rabani [CKR98]. An improved (1.3438 — £fe)-approximation algo- 
rithm for the problem was obtained by Karger et al. [KKS“'"99]. In particular, 
for fc = 3 the algorithm of Karger et al. [KKS“'"99] achieves an approximation 
ratio of 12/11, which matches the integrality gap of the linear programming 
relaxation. This result was also obtained independently by Cunningham and 
Tang [CT99]. 

In this work, we define and study a natural generalization of both the multicut 
and multiway cut problems: the minimum multi-multiway cut problem. The 
input of the minimum multi-multiway cut problem consists of an undirected 
graph G = {V, E) with a weight function w : E ^ defined on its edges, and 
k sets of vertices 81 , 82 , ■■ ■ , 8 k (also referred to as groups). The goal is to find 
a subset of edges of minimum total weight whose removal disconnects, for every 
1 < z < fc, every two vertices u,v € 8 i. When = 2, for all 1 < i < k, the 
minimum multi-multiway cut problem is exactly the minimum multicut problem, 
and when k = 1, the minimum multi-multiway cut problem is the minimum 
multiway cut problem. 



The minimum multicut problem. The minimum multicut problem (and its rela- 
tion to multicommodity flow) have been extensively studied during the last few 
decades. The problem in which fc = 1 is the standard s — t cut problem, and is 
known to be solved exactly in polynomial time [FF56]. The case in which k = 2 
was also shown to be polynomially solvable by Yannakakis et al. [YKCP83] using 
multiple applications of the max-flow algorithm. For any fc > 3 the problem was 
proven to be APX-hard by Dahlhaus et al. [DJP+94] and thus cannot permit a 
PTAS (unless P=NP). 

The currently best known approximation ratio for the minimum multicut 
problem is obtained in the work of Garg, Vazirani, and Yannakakis [GVY96]. 
They present a polynomial algorithm that, given a graph G and a set of k pairs of 
vertices, finds a multicut of weight at most O(logfc) times the optimal multicut 
in G. Their algorithm is based on a natural linear programming relaxation of 
the minimum multicut problem and has the following outline. By solving the 
relaxation, a fractional multicut of the given graph G is obtained. It can be 
seen that this fractional solution implies a semi-metric on the vertices of G. 
This semi-metric is now used to round the fractional multicut into an integral 
one. Namely, the so called region growing scheme (introduced by Leighton and 
Rao [LR99] and used also by Klein et al. [KRAR95]) is applied to define for each 
pair (si,ti) a region, i.e. a subset of vertices, which are in this case a ball of a 
specific radius centered at Si. The multicut obtained by the algorithm is now 
defined as all edges in E with are cut by one of the defined regions. 
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Several results in the field of approximation algorithms have been inspired 
by the region growing technique for rounding the solution of linear programs. 
These include applications of the divide and conquer paradigm (see for exam- 
ple a survey by Shmoys [Shm96]), the design of approximation algorithms for 
the minimum multicut problem on directed graphs [KPRT93,ENSS98,CKR01, 
Gup03] , and the results recently obtained for the minimum correlation clustering 
problem [DI03,CGW03]. 

In this work we study the region growing rounding technique when applied 
to the multi-multiway cut problem. 

Our results. In this paper we present two main results. First, we present an 
approximation algorithm for the multi-multiway cut problem with an approxi- 
mation ratio which matches that obtained by [GVY96] on the standard multicut 
problem. Namely, our algorithm has an 0(log/c) approximation ratio. Our algo- 
rithm solves a natural linear programming relaxation of the multi-multiway cut 
problem, and rounds the fractional solution obtained using an enhanced region 
growing technique. Roughly speaking, the region growing technique used in this 
work differs from that used in previous works as in our case multiple regions are 
grown in a simultaneous manner rather than one by one. 

Secondly, we consider instances to the minimum multi- multi way cut prob- 
lem which are known to have an optimal solution of light weight. Denote such 
instances as light instances. We show that our algorithm has an approximation 
ratio substantially better than 0(log k) when restricted to such light instances. 
Gonsidering the connection between the multi-multiway cut problem and the 
closely related minimum uncut problem, we show that our result on light in- 
stances of minimum multi-multiway cut imply a result of independent interest 
on the minimum uncut problem. Our results can be summarized as follows. 

Theorem 1 (General multi- multiway cuts). There exists a polynomial time 
algorithm which approximates the minimum multi-multiway cut problem within 
an approximation ratio o/41n(fc-|- 1). 



Theorem 2 (Light multi- multiway cuts). Let I be an instance to the min- 
imum multi-multiway cut problem. Let Optj be the weight of the optimal multi- 
multiway cut of instance L. Lf w{e) > 1 for all e € E, then one can approximate 
the minimum multi-multiway cut problem on L within an approximation ratio of 
41n(20pt/). 



Corollary 1 (Light minimum uncut). Lf an undirected graph G = (V, E) 

can be made bipartite by the deletion of k edges, then a set of 0{k log k) edges 
whose deletion makes the graph bipartite can be found in polynomial time. 

A few remarks are in place. Recall that the multi-multiway cut problem is 
a generalization of both the multicut and multiway cut problems. Hence, our 
results on the multi-multiway cut problem apply to both these problems as 
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well. Specifically, Theorem 2 implies a 4hi(20pti) approximation ratio for the 
standard minimum multicut problem. 

To the best of our knowledge, light instances of the minimum multicut prob- 
lem have not been addressed directly in the past. However, light instances of the 
symmetric multicut problem on directed graphs ^ have been considered. Namely, 
Seymour [Sey95] proved an existential result which implies (via [ENSS98]) an 
0{log{Opti)loglog{Opti)) approximation algorithm for the symmetric multi- 
cut problem in the directed case (under the same setting as Theorem 2). This 
in turn implies an 0(log{Optj)loglog{Optj)) approximation algorithm for the 
undirected case as well. Hence, in this case our contribution can also be viewed 
both as a direct proof and an improved result for the “light multicut” problem 
on undirected graphs. 

Our second remark addresses Corollary 1 which discusses the familiar mini- 
mum uncut problem. Let G = (V, E) be an undirected graph with a nonnegative 
weight function w : E ^ R~^ defined on its edges. The minimum uncut problem 
is the problem of finding a set of edges of minimum weight whose removal dis- 
connects all odd cycles in G, i.e. the resulting graph is bipartite. This problem 
is also known as a special case of the minimum 2CNF= deletion problem. The 
problem is known to be APX-hard [PY91], and has an 0(log |y|) approximation 
algorithm [GVY96]. The later follows by reducing an instance of the minimum 
uncut problem to an instance of minimum multicut in which there are \V\ pairs 
of terminals. The parameterized complexity of the minimum uncut problem has 
also been considered in the past {e.g. by Mahajan and Raman [MR99]). To the 
best of our knowledge, the question whether this problem is fixed parameter 
tractable remains open [DowOS] . In other words, given that an undirected graph 
can be made bipartite by deleting k of its edges, no 0{f{k)poly{\V\))-time algo- 
rithm for finding such k edges is known (/ is not necessarily polynomial in k). 
In this case. Corollary 1 implies an 0{poly(k, |y|))-time algorithm which finds 
0(fclog k) such edges. We elaborate on additional results regarding the minimum 
uncut problem in the discussion section and the appendix of this work. 

Organization. The remainder of the paper is organized as follows. In Section 2 
we present our algorithm for the minimum multi-multiway cut problem. The 
proof of Theorems 1 and 2 appear in Section 2.4. In Section 3 we present the 
proof of Corollary 1. Finally, in Section 4 we briefly discuss some future research 
directions, regrading both the minimum multi-multiway cut and minimum uncut 
problems (some which are further elaborated in the Appendix). 

2 The Multi-multiway Cut Problem 

In this section we present our approximation algorithm for the multi-multiway 
cut problem. 

^ In the symmetric multicut problem on directed graphs we are given a directed graph 
with k pairs of vertices (si, ti), and our objective is to find a set of edges of minimnm 
weight which disconnects all cycles between Si and L for alH = 1, . . . , fc. 
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2.1 Multi-multiway Cut Linear Programming Relaxation 

A multi-multiway cut can be represented by a set of Boolean variables x(e), 
one for each edge e € E. If e € E belongs to the multi-multiway cut x(e) = 1, 
otherwise x(e) = 0. Our objective is to find a minimum weight multi-multiway 
cut which disconnects every path connecting pairs of vertices from the same 
group. We denote by V the set of all such paths. The multi-multiway cut problem 
may be posed as the following integer program: 

min EeeEMe)x(e) 

Eespa:(e)>lVPGP 
x(e) G {0, 1} Ve G if 

By relaxing the integrality condition, this integer program may be relaxed to 
obtain the multi-multiway cut linear programming relaxation: 

min EeGB ^(e)2;(e) 

EeGpa:(e)>lVPGP 
x(e) >0 Ve G if 

It is not hard to verify that this relaxation can be solved in polynomial time 
regardless of the fact that it may involve exponentially many constraints. This 
follows from the observation that the variables x(e) imply a semi-metric on the 
vertices of the given graph G. Namely, one can define the distance between any 
two vertices u and v by the length of the shortest path between u and v, where 
every edge e in if has length x{e). Given this semi-metric, the constraints above 
are equivalent to the requirement that the distance between every pair of vertices 
belonging to the same group is at least one. This, in turn implies a natural sep- 
aration oracle for the multi-multiway cut linear programming relaxation. There 
is also an equivalent linear program of polynomial size. This relaxation has an 
integrality gap of I2(logA:). Letting LP be the value of the linear relaxation, 
the integrality gap is also 0{log{LP)) (even on graphs with edge weights > 1). 
Both are implied by the integrality gap of the natural minimum multicut linear 
programming relaxation ([GVY96]). This implies that our analysis is tight. 

2.2 Definitions and Notations 

Let x = {x{e)}eeE be an optimal solution to the multi-multiway cut linear 
programming relaxation. Denote by LP the value of the linear program at x, 
and denote by Wmin the minimal weight of an edge e G if. As mentioned above, 
we define the length of an edge e G if to be x{e), and the length of a path to 
be the sum of the lengths of its edges. The distance between a pair of vertices 
u,v GV, denoted by distx{u,v), is now defined to be the length of the shortest 
path between u and v. Let Cut{S) = | u G S', v G V \ S}, for any set 

of vertices S GV. Also let wt{E') = Ebgp' edges E' C E. 

We denote the set of all distances of vertices from terminals in Si by DisU = 
{distx{s,u) I s G Si, u G V}. For r G [0,oo) the ball of radius r centered at 
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Sij is defined as Ballij{r) = {v € V \ distx{sij,v) < r}, where 1 < i < k 
and 1 < j < |S'i|. Finally, throughout our work, for various functions / let 
f{x~) = limj^^,,- f{y). 

Roughly speaking, we will be interested in two properties of a given set of 
balls. The first is the number of edges cut by these balls. The second is the 
number of edges inside the given set of balls, referred to as the volume of the 
balls. 

More specifically, for each i, we consider the set of balls centered at vertices 
of Si where each ball is of equal radius r € [0,oo). We define (an upper bound 
on) the weight of the edges ‘leaving’ these balls as 



|Si| 

c^(r) = ^ wt{Cut{Ballij{r))). 



The volume of this set of balls is defined to be 



Vi{r) 



a 




E 

e = (u,«)6E 
UyV^Ballj^j (r) 



w{e)x{e) 



E w{e){r 

e = (u,«)6E 
u^Ballj^j (r) 
v^Ballj^j (r) 



distxi^Sij , 




where a is a parameter, which does not depend on r, and will be specified latter. 

A few remarks are in place. First notice that in Ci(r), an edge may contribute 
more than once, as Cut{Ballij{r)) fl Cut{Ballij>{r)) is not necessarily empty 
(where 1 < j ^ j' < |5'i|) and thus Ci{r) is an upper bound on the value of the 
cut. Secondly, in the definition of Vi(r), the summand a ■ LP should be viewed 
as the volume of a set of balls all of radius 0. Finally, note that the function Vi 
in [0,oo) is not necessarily continuous but is always continuous from the right. 



2.3 Algorithm 

Our polynomial time approximation algorithm for multi-multiway cut is de- 
scribed in Figure 1. Roughly speaking, after solving the linear programming 
relaxation from Section 2.1, our algorithm rounds the fractional solution using 
a region growing rounding technique. Namely, for every set Si which includes a 
pair of connected vertices, we simultaneously grow balls of a specific equal radius 
ri < 1/2 around all vertices in Si. The edges in the cut produced by these balls 
are then removed from the graph. The radius picked is determined by the 
values Ci{r) and Vi{r) defined previously. 

Our algorithm depends on two parameters a > 0 and 5 G [0, 1/2). We assume 
w.l.o.g. that w(e) > 0 for all e G E. Therefore, if there exists a path between two 
(or more) vertices in Si, then Vi{r) > 0 for all r G (<5, oo). In particular, Step 4 
of the algorithm will be well defined. Note that the value of the functions Ci and 
Vi may change during the execution of the algorithm, as edges are removed from 
E. 

In the next subsection we prove the following theorem: 
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CutAlg(q, S) 

1 . CUT^cj) 

2. Solve the multi-multiway cut linear programming relaxation. 

3. While there is a path from some s' £ Si to s" £ Si (where 1 < * < fe) 

4. Set Vi to be the radius r in (^Disti U {|}) H (5, |] that minimizes 

the value of j . 

Vi [r ) 

5 . F^\jf^[Cut{Ballij{r-)) 

6. CUT £- CUT U F 

7. E E \ F 

8. Return CUT 



Fig. 1. Approximation algorithm for minimum multi- multiway cut 



Theorem 3. CuTALG(a, 5) produces a cut of weight at most 

2(1 + M ( a + g)LP \ 

1-25 \aLP + 25wmin) 

As an immediate result of Theorem 3 we obtain Corollaries 2 and 3 below, 
which imply Theorems 1 and 2 stated in the Introduction, respectively. 

Corollary 2. CuTALG(l/fc, 0) is a 41n(fc -|- 1) -approximation algorithm for 
multi-multiway cut. 



Corollary 3. If w{e) > 1 for all e £ E, CutAlg(0, 1/4) is a 41n(2LP)- 
approximation algorithm for multi-multiway cut. 



2.4 Analysis 

In this subsection we prove Theorem 3. 

Lemma 1. The function Vi is differentiable in (0,oo) except for a finite number 
of points. In addition, if Vi is differentiable at r then Cj(r) = -£:Vi{r). 

Proof. By the definition of Vi, if Vi is not differentiable at r, then r must be equal 
to distx{sij,u) for some j £ {!,... ,|S'i|} and some vertex u £ V. Therefore, 
there is only a finite number of values in (0, oo) for which Vi is not differentiable. 
The second statement stems from the definition of the functions Vi and Ci. 



Lemma 2. For every 6 G [0, 1/2), if there is a path between vertices in Si, then 
there exists r £ (5, 1/2) such that Ci(r) < In i’i(r). 
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Proof. Assume on the contrary that for every r G (5, 1 /2) 



Ci{r) > 



1-26 



In 



/(l + a)LP\ 

I v^iS) J 



Vi(r). 



Recall that Vi(r) > 0 for all r G (5, 1/2), as there exists a path between vertices 
in Si, and we assume that w(e) > 0 for all e G E. Therefore, 



{1/2 -6)y^ln 

Js v^{r) 1-26 



f {l + a)LP \ 

\ v^{S) J 



= In 



f {l + a)LP \ 

\ v^{S) )■ 



By Lemma 1, Vi is not differentiable in only a finite number of points, say 
Si < ■ ■ ■ Si si- Set So to be 6 and s;+i to be 1/2. Now, by Lemma 1, and by the 
fact that Vi{r) is monotone increasing, and continuous from the right: 



nl/2 



c^{r) 

Vi{r) 



dr = J2 



^j+1 .A. 

dr 



Tr<r) 

Vi{r) 



dr 



3=0 “J 
i 

= (lnu*(s“+i) - In ?;i(sj)) 

3=0 

< lnWj((l/2)“) - lnwj((5) 

-1 



As Vi{^ ) < (1 + a)LP, the latter yields a contradiction. 



Proof, (of Theorem 3): First, observe that the set of edges returned by the 
algorithm disconnects all terminals within a group, since every path between 
terminals has length at least 1, while the radius was chosen to be less or equal 
to 1/2 (by Step 5 of the algorithm this also holds for r = 1/2). 

Let I C {1, . . . , fc} be the set of group indices for which the algorithm entered 
the while loop. The weight of the multicut produced by the algorithm is at most 
(recall the definition of from Step 4 of our algorithm) . 

By Lemma 2, for each i G / there exists r' G (<5, 1/2) such that Ci(r') < 
T^ 2 i 5 choice of r* (Step 4 of the algorithm), it is not 

hard to verify that Cj(r“)/uj(r“) < Ci{r[)/vi{r[). This follows from the fact that 
the radius r in {6, |] that minimizes the value of j is actually in the set 
Disti U {!}. Therefore the weight of the multicut produced by the algorithm is 
at most Since for all i G / there exists a path 
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between at least two vertices in the group Si, 

is.l / \ 

Wi(5) > aLP + E E ^ ^ distxi^Sij ■, u')') j 

j = l V e = (u,v)€E e=(u,v)€E ) 

v^Ballij(S) 

^ OlLP ~h '^dWmin • 

Hence, the weight of the multicut produced by the algorithm is bounded by 
In 5^ie/ni(r")- Observe that by the definition of v, and by 

Step 7 of the algorithm ^i^jVi{r~) < kaLP + LP. Therefore the proof is 
completed. 

3 “Light” Minimum Uncut 

Corollary 1. If an undirected graph G = (V,E) can he made bipartite by the 
deletion of k edges, then a set of 0{klogk) edges whose deletion makes the graph 
bipartite can be found in polynomial time. 

Proof. Let G = (V, E) be an undirected graph of size n which can be made 
bipartite by the deletion of k edges. Using the reduction presented in [KRAR95] 
one can (efficiently) obtain a graph G' = {V ,E') (with unit edge weights) and 
a set of n pairs of vertices {(si,ti)}(Li with the following properties: (a) the 
minimum multicut on input G' , {(si,h)}iLi is of value at most 2k, and (b) 
given any multicut of G' of weight w one can (efficiently) find at most w edges 
in G whose removal results in a bipartite graph. Now using CutAlg(0, 1/4) on 
input G', {{si,ti)}f^^ we obtain (Corollary 3) a multicut of G' of weight at most 
O(fclogfc), which by the above implies our assertion. 

4 Discussion 

In this work we have defined and analyzed the multi-multiway cut problem, 
which is a generalization of both the multicut and the multiway cut problems. We 
have presented an approximation algorithm for the minimum multi-multiway cut 
problem with an approximation ratio that matches the currently best known ap- 
proximation ratio for the minimum multicut problem. Moreover, we have shown 
that our algorithm performs significantly better on instances which are known to 
have a “light weight” multicut. The question whether there exists an algorithm 
(for both the minimum multi-multicut and the minimum multicut problems) 
with an approximation ratio that improves over the presented ratio of O(logfc) 
remains an intriguing open problem. It is not likely that such an algorithm will 
use the standard relaxation of the multi-multiway cut problem we have presented 
in Section 2 in a direct manner due to its large integrality gap. This holds as 
well for the linear program enhanced with the so-called metric constraints. 
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When considering the minimum uncut problem, we find ourselves in a similar 
situation. Again, the standard linear programming relaxation has an integrality 
gap of l7(logn), which matches the currently best known approximation ratio for 
the problem (here n is the size of the given graph) . In the appendix of this paper, 
we show that such an integrality gap holds also when the standard relaxation is 
enhanced with additional odd cycle or metric constraints. The integrality gap of 
the relaxation enhanced with both odd cycle and metric constraints is yet to be 
resolved. 

One may ask similar questions regarding semidefinite programming relax- 
ations of both the minimum multi-multiway cut problem and the minimum uncut 
problem. We have noticed that a natural semidefinite programming relaxation 
for minimum uncut (which is similar to the maximum cut semidefinite program- 
ming relaxation of [GW95]) has an integrality gap of I7(n). This leads to the 
study of the quality of enhanced semidefinite relaxations, such with additional 
constraints. A standard set of constraints one may consider are the so-called 
triangle constraints {e.g. [FG95]). The integrality gap of such an enhanced re- 
laxation is yet to be resolved. Nevertheless, one can show that using the standard 
random hyperplane rounding technique of [GW95] on such an enhanced relax- 
ation cannot yield an approximation ratio better than implying that 

some other rounding technique is needed. Due to space limitations, our results 
on semidefinite programming relaxations for minimum uncut are omitted. 

Acknowledgments. The authors would like to thank Uri Zwick for many help- 
ful discussions. 
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Appendix A 

LP Relaxations of the Minimum Uncut Problem 

In this section we mention the limitations of a few natural linear programming 
relaxations of the minimum uncut problem. For a graph G we denote by n the 
number of vertices in G and by C the set of all odd cycles in G. The minimum 
uncut problem may be posed as the following integer program: 

minEeGB^(e)2;(e) 

Eeeca^(e)>lVCGC 
x(e) € {0, 1} Ve G if 

Here, if an edge e G if belongs to the cut we let x(e) = 0, otherwise x(e) = 1. 

Viewing the non-edges of G as edges of weight 0, we may add a Boolean 
variable x(ij) for each pair of vertices i ^ j € V such that ij ^ if, and add 
two sets of constraints. One set, called the metric constraints, corresponds to 
the fact that 1 — a; forms a semi-metric on the vertices of G. This set includes 
the inequalities 1 — x{ij) < 1 — x{ik) -I- 1 — x{jk) for all distinct i,j,k G V. 
The second set corresponds to any “cycle” of odd length £ and includes the 
inequalities ^ 1 for all ii, . . . ,ie &V (here, = zi). 

Relaxing the integrality condition, we obtain the standard linear program- 
ming relaxation for minimum uncut: 

minEeGB^(e)a;(e) 

Eeeca^(e)>lVCGC 
x(e) >0 Ve G if 

As in the multicut linear programming relaxation, this relaxation can be 
solved in polynomial time regardless of the fact that it may involve exponentially 
many constraints. The next two propositions mention two integrality gaps. 

Proposition 1. For any sufficiently large n there is a graph G such that the 
standard linear programming relaxation of minimum uncut with the metric con- 
straints has an integrality gap of Q (log n) on G. 

Proof. Let G be a constant degree expander graph with girth g = 17 (log n) and 
optimal uncut of size I7(n) (for existence of such graphs see for example the 
explicit construction of Lubotzky, Phillips and Sarnak [LPS88]). For each edge 
e G if assign x{e) = 1/g, and for each non-edge ij ^ E assign x{ij) = 1/2. Then, 
the value of the linear programming relaxation is 0{\E\/g), while the optimal 
uncut is of size I7(|if|). 

Proposition 2. For any sufficiently large n there is a graph G such that the 
standard linear programming relaxation of minimum uncut with the second set 
of constraints has an integrality gap of £2 (log n) on G. 

Proof. Use the same graph as in Proposition 1. For each edge e G if assign 
x{e) = 1/g, and for each non-edge ij ^ E assign x(ij) = 1. 
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Abstract. Given an edge-distance graph of a set of suppliers and clients, 
the bottleneck problem is to assign each client to a selected supplier min- 
imizing their maximum distance. We introduce minimum quantity com- 
mitments to balance workloads of suppliers, provide it a 3-approximation 
algorithm, and study its generalizations. 



1 Introduction 

Let G = (y E) with vertex set V and edge set A be a complete graph. The 
distance of every edge between any u and u in U is denoted by d{u,v), which 
satisfies the triangle inequality. Let X and Y be the supplier subset and the 
client subset of V respectively, where X and Y are not necessarily disjoint. The 
general bottleneck problem is to select a subset, S' C A, of suppliers and to as- 
sign every client y € Y to a selected supplier, (j){y) G S, so that the maximum 
distance between y and </>(y) is minimized. 

The general bottleneck problem is easy to solve. However, there usually are 
other constraints on S. To balance workloads of every selected supplier s G S, 
here we introduce a minimum quantity commitment constraint, which requires 
the number of clients assigned to s to be at least as large as a given natural num- 
ber, q. Formally, the bottleneck problem with minimum quantity commitments is 
defined by: 

min ma,xd(y,6(y)), 

SCX,<j>:Y^S y€Y ” 

such that 

\{y\4'{y) = s,y £ U}| > q, for each s G 5. 

Figure 1 shows a sample instance with two suppliers and four clients denoted 
by rectangles and circles, respectively. The reason to assign the gray client to 
the black supplier is to satisfy minimum quantity of g = 2 for every supplier. 

The minimum quantity commitment is motivated by a stipulation by the U.S. 
Federal Maritime Commission, that requires every supplier shipping to US must 
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Fig. 1. An instance with q = 2 



have at least a minimum amount of cargo. This stipulation increases the difficul- 
ties of global companies in selecting shipping suppliers and assigning them cargo 
to minimize the maximum transportation distance. We extend the bottleneck 
problem with minimum quantity commitment. The minimum quantity commit- 
ment is helpful in balancing the workloads of selected suppliers, especially when 
contracting the use of large facilities which require a minimum quantity of clients 
or transactions in order for the usage to be possible or profitable. 

The constraint of the minimum quantity commitment has been studied in 
the analysis of supply contracts [2], but never in the study of the bottleneck 
problems. However, without this constraint, variations of the bottleneck prob- 
lem have been extensively studied on their approximation algorithms [14] in 
the literature. For instance, when the supplier set and the client set are equal, 
restricting the number of selected suppliers to be a given integer k leads the 
fc-center problem [9], for which the best possible approximation factor of 2 has 
been proved and achieved by Hochbaum and Shmoys [7]. Its generalizations of 
the vertex cost and vertex weight were also considered [5,8,13]. More recently, 
Bar-Ilan, Kortsarz and Peleg investigated a capacitated fc-center problem where 
the number of clients for each center was restricted to a service capacity limit 
or maximum load [1]. Their work was improved in recent work by Khuller and 
Sussmann [11]. Further to this, Krumke developed a “fault tolerant” /c-center 
problem to ensure that enough backup centers are available for each client [12]. 
Its approximation algorithms were improved and extended in [3] and [10]. 

Among those studies, no provision has been made for the minimum quantity 
commitment to balance the workloads of suppliers in bottleneck problems. For 
this reason, we study its in-approximability hardness and approximation algo- 
rithms. Besides its basic case, the following generalizations of the vertex-weight 
and the vertex-cost are also considered. 

Let w{v) and c(v) be the weight and the cost of each vertex, v G V. 
On one hand, define the weighted distance from u € F to u G F to be 
w(u,v) = d(u,v)w(v). By such a generalization of the vertex- weight, the ob- 
jective function becomes: 



min maxw(y, (/>(y)). 

SCX,<f>-.Y^S yeY 



It makes sense, especially when we regard l/w{x) as the shipping speed of sup- 
plier X, that the weighted distance indicates its shipping delay. Let [3 denote the 
ratio between the maximum value and minimum value of the vertex weights. 
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Table 1. Summary of approximation factors 





(3=1 


(3 is arbitaray 


c{x) = 0 for a: G A 


3+ 


3T 


c{x) = 1 for a: G A 


5(3“) 


5(4“) 


c(x) is arbitrary for x G X 


5(4“) 


3-b2/3 



f: when best possible factor achieved 
tt when X = Y 



Thus, when /3 = 1, the generalization of the vertex- weight will have no effect. 

On the other hand, we force the total cost of selected suppliers to be less 
than a given budget, k; that is 



J^c(s) < k. 

ses 

This generalization of the vertex-cost produces an interesting special case when 
c{x) = 1 for all X G X, which restricts the number of selected suppliers, like the 
k-center problem, to be at most k. However, it will have no effect when c{x) = 0 
for all X G X. 

In addition, special cases with X equal to Y will also be studied, as long as 
their approximate factors can be improved. 



1.1 Main Results 

Results for variant cases are summarized in Table 1, where the best possible 
approximation factor can be proved to be 2 when X = Y, and to be 3 otherwise. 
As shown in Table 1, the best possible factor of 3 has been achieved for the two 
cases marked by f, where c{x) =0 for x G X. When c{x) = 1, a constant factor 
of 5 close to 3 can be achieved whenever [3 is one or an arbitrary number. For 
cases with arbitrary c{x), we have obtained an approximation factor of (3 -I- 2(3) 
which is equal to 5 when (3=1. Moreover, when assuming A = R, we improve 
the approximation factors to be 3, 4 and 4, compared with 5, for the three cases 
marked by jj, respectively. 



1.2 Preliminaries 

Throughout this paper, OPT signifies the optimal value of the objective function. 
We assume that the complete graph, G = (V, E), is directed with V = {ui, ..., u„} 
and E = V X V = {ei, where m = n? and each vertex, v G V, has a 

self-loop, (v,v) G E, with distance d{v,v) = 0. For each edge, Cj G E, let d{ei) 
and w{ei) denote its distance and its weighted distance, respectively. 

We will use the following concept of domination relations frequently in this 
paper. For any given graph H = (Vh, Eh), a vertex, v G Vh, is said to dominate 
a vertex, u G Vh, if and only if v is equal or adjacent to u. Base on this, given 
a supplier set, X C Vh, and a client set, Y C Vh, we will let dom{x) indicate 




288 



A. Lim and Z. Xu 



the number of clients in Y, dominated hy x £ X. Moreover, we will let I{H) 
denote a maximal independent set [6] of H, in which no two different vertices 
share an edge and no vertex outside I{H) can be included while preserving its 
independence. 

The rest of the paper presents approximation algorithms for the bottleneck 
problem with minimum quantity commitments and their generalizations. Our 
methods extend from the threshold technique used for the /c-center problem 
[8] and are designed to address the new constraint of minimum quantity com- 
mitments. The next three sections illustrate the in-approximability results and 
the approximate algorithms for cases with c{x) = 0, with c(x) = 1, and with 
arbitrary c(x), respectively. After that, a brief conclusion will be provided in 
Section 5. 

2 Approximation for c{x) = 0 

We begin with the in-approximability result for cases with c{x) = 0 where x G X, 
which can be proved by a reduction from the X3C problem [6], as shown in [15] 
for the lack of space. 

Theorem 1. When c{x) = 0 for all x G X , given any integer q > 3, no (3 — e)- 
approximation algorithm exists for any e > 0 unless MV — V, even if [3 = 1. 

To achieve the best possible factor of 3 even for an arbitrary (3, consider 
the following threshold process shown in Algorithm 1. Firstly, sort the edges 
according to their weight distance so that w{ei) < w{e 2 ) < ... < w{em)- Let 
Gi = {V, Ef) where Ei = {ci, ..., Ci} for 1 < t < m. Then, collect those suppliers 
that dominate at least q clients each in Gi to form Qi = {x\dom{x) > q in Gi, x G 
X}. It is easy to see that if a feasible schedule exists with an objective value 
of w(ei), then suppliers in Qi dominate all clients of Y in Gi. Therefore, let j 
denote the threshold, which is the smallest index i leading Qi to dominate Y in 
Gi, which ensures w{ej) < OPT. 

Next, we construct an undirected graph, H = (Qj,E^), where E^ contains 
an edge, (u,v), for u,v G Qj if and only if both u and v dominates a common 
client of Y in Gj. (Note that EI is also called the competition graph of Gj 
[4]). Then, a maximal independent set of H, denoted by I{H), can be obtained 
by Algorithm 2. Since Algorithm 2 iteratively chooses the smallest weighted 
unmarked vertex, we know that for each u G Qj, there exists a vertex v G I{H), 
which marks u in Algorithm 2, to have (u,v) G E^ and w{v) < w{u). 

Lastly, we assign each client, y G Y, to a, supplier, a{y) G I{EI), as follows. 
If y is dominated by a supplier, v G I{H), in Gj, we assign a{y) = v; otherwise, 
let u denote the vertex in Qj with {y,u) G Ej. Then, assign a{y) = v' where 
v' G I{H) has (u,v') G E^ and w{v') < w{u). 

For its approximation factor, let us prove the following theorem. 

Theorem 2. When c(x) = 0 for all x G X and j3 is arbitrary, Algorithm 1 
achieves an approximation factor of 3. 
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Algorithm 1 (When c(x) = 0 for a; G A and (3 is arbitrary) 

1: Sort edges so that w{ei) < w{e 2 ) < ••• < w{em), and construct Gi, G 2 , Gm, 
where Gi — {V, Ei) with Ei = {ei, Ci} for 1 < i < m; 

2: Let Qi = {x\dom{x) > g in Gi,x € X}, for 1 < i < m; 

3: Find the threshold, j, that is the smallest index, i, so that Qi dominate all clients 
of Y in Gi; 

4: Construct a graph H = (Qj,E^), where for any m,u € Qj, E^ contains an edge 
of (u, v) if and only if there exists r £Y with (r, u) € Ej and (r, v) € Ej-, 

5: Call Algorithm 2 to obtain 1(H), a maximal independent set of H\ 

6: for each y £Y do 

7: if j/ is dominated by a supplier v £ 1(H) in Gj then 

8: Assign y to a(y) = v ; 

9: else 

10: Let u be the vertex in Qj with (y,u) G Ej-, 

11: Assign y to a(y) = v' , where v' G 1(H) has (u, v') G E^ and w(v') < w(u)-, 

12: end if 

13: end for 

14: Return 1(H) and a-(y) for y £Y . 

Algorithm 2 (Maximal independent set of H' = (V' ,E') with weights w) 

1: G ^ 0 and U' £- V'-, 

2: while U' ^ % do 

3: Choose the vertex v, that has the smallest weight, w(u), among all u G U'-, 

4: U ^UY{v} and U' ^ U' - {«}; 

5: Mark v and all the vertices u £ U' adjacent to v, i.e., (u,v) £ E' , by U' £- 

G'-M; 

6: end while 

7: Return G as an maximal independent set of H' . 

Proof. Since suppliers of Qj dominate all clients of Y in Gj, each y £ Y must 
be assigned to a supplier, a(y) G 1(H). Because of the independence of 1(H), no 
clients are dominated by two different suppliers of 1(H) in Gj. Since each supplier 
of 1(H) dominates at least q clients in Gj, the minimum quantity commitment 
of q must be satisfied. Moreover, by w(ej) < OPT, to prove the approximation 
factor of 3, we only need to show w(y, a(y)) < 3w(ej) for each y £ Y.lt is trivial 
if a(y) dominates y in Gj-, otherwise, we have (y,u) £ Ej, w(a(y)) < w(u) and 
(u, a(y)) £ E^ , implying a common client r £Y with (r, u) and (r, cr(y)) in Ej as 
shown in Figure 2. Hence, w(y,a(y)) < (d(y,u) -£ d(r,u) -£ d(r, a(y)))w(a(y)) < 
3w(ej), which completes the proof. □ 

Before we close this section, we show that when X = Y, the best possible 
approximation factor becomes 2, instead of 3, for any given integer q > 5, by 
the following theorem which can be proved by a reduction from a variation of 
the X3C problem [6]. (Its proof is shown in [15] for the lack of space.) 

Theorem 3. When c(x) = 0 for all x £ X and X = Y, given any integer q > 5, 
no (2 — e)- approximation algorithm exists for any e > 0 unless MV = V, even if 
P=l. 
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u in Qj a(y) in 1(H) 




Fig. 2. Diagram of the proof for Theorem 2 

Although no 2-approximation algorithm has be obtained for cases with X = 
Y yet, Theorem 3 might imply that the assumption of X = Y probably simplifies 
the bottleneck problem. 

3 Approximation for c{x) = 1 

Recall that by Theorem 1, the best possible approximation factor is 3 when 
c{x) = 0 for x G AT, but restricted to q > 3. However, when c(x) = 1 for x € X, 
we can show that 3 remains the best possible approximation factor even for 
q <3. 

Theorem 4. When c{x) = 1 for all x € X, given any integer q>Q, no {3 — e)- 
approximation algorithm exists for any e > 0 unless MV — V, even if (3 = 1. 

Proof. It can be proved by a reduction from the Minimum Cover problem [6], 
as shown in [15] for the lack of space. □ 

For the case with c{x) = 1 and an arbitrary (3, Algorithm 3 guarantees an 
approximation factor of 5, which is close to its best possible factor of 3. Compared 
with Algorithm 1, Algorithm 3 seeks a different threshold, j,to satisfy the budget, 
k, as follows. It constructs an undirected graph. Hi for 1 < i < m, where, for 
any two clients u,v G Y ,Hi contains an edge (u,v) if and only if both of them 
are dominated by a common supplier, r G Qi, in Gj. Therefore, the threshold 
j is defined to be the smallest index, i, such that < fc for 1 < z < m. 

Because no two clients in I{Hj) can be dominated in Gj by a common supplier 
in Qj, we have w{ej) < OPT. 

Then, for y G Y, let s{y) indicate the supplier with the lightest weight, among 
those suppliers in Qj that dominate y in Gj. Let Q = {s(y)|z/ G I{Hj)} to be the 
subset of lightest- weight suppliers of clients in I{Hj). To ensure the minimum 
quantity of q, construct another graph = {Q,E^), where any two suppliers, 
u and V, in Q have an edge, (zt, v) G , if and only if they dominate a common 
client, r GY in Gj. By Algorithm 2, I(H^) can be obtained. 

Lastly, we assign each y GY to a supplier a{y) G I{H^) as follows. If y is 
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Algorithm 3 (When c(x) = 1 for a; G A and (3 is arbitrary) 

1: Sort edges so that w(ei) < 10 ( 62 ) < ... < w(em) and construct Gi,G 2 , and 

Q 2 5 ... 7 Q m 5 

2: Construct graphs Hi, H 2 , ■■■, Hm on Y, where Hi contains an edge (u,v) for any 
two u, u G y if and only if a vertex r £ Qi exists so that both (u, r) and (v, r) are 
in Ei' 

3: Find the threshold, j, that is the smallest index, i, such that \l(Hi)\ < k for 
1 < i < m; 

4: Let Q = G I(Hj)}, where s(y) indicates the supplier with the lightest 

weight, among those suppliers in Qj that dominate y in Gj; 

5: Construct a graph H^ = (Q,E'^), where for any two suppliers u and v in Q, an 
edge (u, v) G if and only if there exists r £ Y with (r, u) G Ej and (r, v) G Ej\ 
6 : Call Algorithm 2 to obtain I(H^), a maximal independent set of iL®; 

7: for each y £Y do 

8 : if y is dominated by a vertex v £ I(H^) in Gj then 

9: Assign y to a(y) = v, 

10 : else 

11: Let u be the vertex in I(Hj) with (y,u) in Hj 

12: Let v' be the vertex in I(H^) with (s(u),v') G E^ and w(v') < w{s(u))’, 

13: Assign y to cr(j/) = v' ; 

14: end if 

15: end for 

16: Return I(H^) and a(y) for y £Y . 

dominated by a supplier, v G I(H^), assign a(y) = v. Otherwise, let u be the 
vertex in I(Hj) with (y,u) in Hj. We assign a(y) = v' , where v' G I(H'^) has 
(s(u),v') G and w(v') < w(s(u)). 

To show its approximation factor of 5, we prove the following theorem. 

Theorem 5. When c(x) = 1 for all x £ X and (3 is arbitrary, Algorithm 3 
achieves an approximation factor of 5. 

Proof. By \Q\ < \I(Hj)\ < k and I(H^) C Q, we have \I(H'^)\ < k. Because of 
the independence of I(H^), no client of Y is dominated in Gj by two different 
suppliers of I(H^). Thus, at least q clients are dominated by, and therefore, 
assigned to each supplier v G I(H^), by I(H^) C Q C Qj. Since w(ej) < OPT, 
to prove the approximation factor of 5 only needs to show w(y, o(y)) < 5w(ej). It 
is trivial if y is dominated by w G I(H'^) in Gj. Otherwise, we have {s{u),a(y)) G 
and w{cr{y)) < w{s(u)), where (y,u) in Hj. As shown in Figure 3, both s(u) 
and fj(y) must dominate a common client v £ Y, and both y and u must be 
dominated by a common supplier, r £ Qj. By w(a(y)) < w(s(u)) < w(r), we 
have {y,cr(y)) < (d(y,r) + d(u,r) + d(u,s(u))+d(v,s(u)) + d(v,a(y)))w(a(y)) < 
5w(ej), which completes the proof. □ 

The rest of this section studies cases of c(x) = 1 when X = Y. Under the 
assumption of X = Y, we can extend the in-approximability result in Theorem 3 
from (7 > 5 to g > 0 as follows. 
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r in Qj s{u) in Qj aiy) in I(lfi) 




Fig. 3. Diagram of the proof for Theorem 5 

Theorem 6. When c{x) = 1 for all x € X and X = Y, given any integer q > 0, 
no (2 — e)- approximation algorithm exists for any e > 0 unless MV — V, even if 
P=l. 

Proof. It can be proved by a reduction from the Dominating Set problem [6], as 
shown in [15] for the lack of space. □ 

When c{x) = 1 for x G X, the assumption oi X = Y allows a 3-approximation 
Algorithm 4 and a 4-approximation Algorithm 5 to solve the case with (3=1 
and the case with arbitrary (3, respectively. 

Assume V = X = Y . Let us consider (3=1 first. As shown in Algorithm 4, 
edges are sorted on their distance non-decreasingly, thus Gi = (V, Ei) where 
Ei = {ei, 62, ..., 6i} for 1 < i < TO. Let Qi = {v\dom{v) > g in Gi,v G V} for 
1 < z < TO. To ensure the minimum quantity of q, construct an undirected graph 
Ti = {QijEj), where Ej contains an edge {u,v) for any two u,v G Qi if and 
only if there exists r G IL so that either r dominates both u and v in Gj, or 
both u and v dominate r in G^ . Accordingly, the threshold j is defined to be the 
smallest index i so that Qi dominates all vertices of V in Gi and that \I{Ti)\ < k. 
Because no two different vertices in /(Tz) can be dominated by a common vertex 
in Gi, we have w{ej) < OPT. 

Lastly, we assign each y gV to a{y) G I{Tj) as follows. If y is dominated by 
a vertex, v G I{Tj), assign u{y) = v. Otherwise, let u be the vertex in Qj with 
(y, u) G Ej, and assign a{y) = v' , where v' G I{Tj) is a vertex with {u, v') G Ej . 
To show its approximation factor of 3, we prove the following theorem. 

Theorem 7. If X = Y, when c{x) = 1 for all x G X and P = 1, Algorithm 4. 
achieves an approximation factor of 3. 

Proof. We have seen that |/(T,)| < k and that at most one supplier in I{Tj) 
dominates y in Gj for each y G V. Thus, for each v G I{Hj) C Qj, at least q 
vertices are dominated in Gj by v and, therefore, are assigned to w.The mini- 
mum quantity commitment is thus satisfied. Moreover, for each y G M, if y is 
dominated by a(y) in Gj,then d{y,a{y)) < d(ej); otherwise, there exists a ver- 
tex, u G Qj, with (u,y) G Ej and (u,a{y)) G Ej, which implies a vertex r GV 
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Algorithm 4 (When X = Y = V, c(x) = 1 for x £ X and /3 = 1) 

1: Sort edges so that d(ei) < d(c 2 ) < ... < d{em), and construct Gi, G 2 , ..., Gm; 

2: Let Qi — {v\dom{v) > g in Gi,v £ V}, for 1 < i < m; 

3: For 1 < i < m, construct an undirected graph Ti = where Ef contains 

an edge (u, v) for any two u,v £ Qi if and only if a vertex r £ V exists so that 
either both (u, r) and (v,r) are in Ei, or both (r,u) and (r,v) are in Ei; 

4: Call Algorithm 2 to obtain I{Ti) for 1 < i < m; 

5: Find the threshold j that is the smallest index i so that Qi dominates all vertices 
of V in Gi and \I(Ti)\ < fc; 

6: for each y £ V do 

7: if j/ is dominated by a vertex v £ I{Ti) in Gj then 

8: Assign y to a{y) = v, 

9: else 

10: Let u be the vertex in Qj with (u, y) £ Ej. 

11: Let v' be the vertex in I{Tj) with (u,v') € Ef; 

12: Assign y to a(y) = v' ; 

13: end if 

14: end for 

15: Return I(Tj) and o-(y) for y £V. 

with either (r,u) and (r,a{y)) in Ej, or (u,r) and {a{y),r) in Ej. Both lead to 
d{y,a{y)) < d{u,y) + d(r,u) + d{r,a{y)) < 3d{ej). By d(ej) < OPT, the proof 
is completed. □ 

Now, we consider that l3 is arbitrary. Its 4-approximation Algorithm 5 is sim- 
ilar to Algorithm 4 except for the following two differences. Firstly, Algorithm 5 
sorts edges non-decreasingly, on their weight distances instead of on their dis- 
tances. Secondly, Algorithm 5 defines s'{v) for v £ V , to indicate the supplier 
with the lightest weight, among those suppliers in V that dominate v in Gj. 
Then, it selects a set, S = {s'(u)|v G I{Tj)}, and assigns each y £V to s'{v) (or 
to s'{v')), instead of to v (or to v' , resp.) by Algorithm 4. 

Theorem 8. If X = Y, when c{x) = 1 for all x £ X and [3 if arbitrary, 
Algorithm 5 achieves an approximation factor of 4- 

Proof. Following similar arguments for Theorem 7, we can obtain w(ej) < OPT, 
I'S'I < \I{Tj)\ < k and the satisfaction of the minimum quantity commitments. 
To prove the approximation of 4, we only need to show that w{y,a{y)) < 3w{ej). 
Assume cr{y) = s'{t) where t £ I{Tj). If y is dominated by t in Gj, then 
w{y,a{y)) < (d{y,t) + d(t,s'{t)))w{s'{t)) < 2w{ej) because w{s'{t)) < w{t). 
Otherwise, there exists u G Qj with (u,y) £ Ej, (u,t) G Ej and w{v') < w{u). 
Thus, two cases follow. For case 1 shown in Figure 4(a), there exists a ver- 
tex r £ V with both (r,u) and (r,t) in Ej. By w{s'{f}) < w{t) < w{u), we 
have w{y,a{y)) < w{d{y,u) + d{r,u) + d{r,t) + d{t, s' {f))w{s' (t))) < 4w(ej). 
For case 2 shown in Figure 4(b), there exists a vertex, r £ V, with both {u,r) 
and ft,r) in Ej. By w{s'(t)) < w(t) < w{u) and w{s'(t)) < w{r), we have 
w{y, (j{y)) < {d{y, u) + d{u, r) + d{t, r) + d{t, s' {f))w{s' (t))) < 4w(cj). This com- 
pletes the proof. □ 
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uinQj tinI(Tj) s\t) in S u in Qj t in I(Tj) s\t) in S 





Fig. 4. Diagram of the proof for Theorem 8: two cases 



4 Approximation for Arbitrary c{x) 

For cases with arbitrary c{x) and /3, a (3 + 2/3)-approximation algorithm can be 
obtained. It is similar to Algorithm 3, except for the definition of the threshold, 
j. For 1 < i < m, let Pi{y) indicate the supplier with the cheapest cost among 
those suppliers in Qi that dominate y in G^. Then, shift clients in I{Hi) to 
their cheapest suppliers in Qi to form Pi = {pi{y)\y G I{Hi)}. Accordingly, the 
threshold, j, is defined as the smallest index, i, such that c{Pi) < k, where c{Pi) 
denotes the total cost of vertices in Pi, for 1 < z < m. Because no two clients in 
I{Hj) are dominated by a common supplier in Gj, we have w{ej) < OPT. 

Afterwards, replacing Q by Pj and s(zt) by Pj{u) in Algorithm 3, we can 
construct = (Pj,E^^), obtain by Algorithm 2 and assign y G Y to 

a{y) G through corresponding steps of Algorithm 3. 

To prove the approximation factor of (3+2/3), we show the following theorem. 



Theorem 9. When c(x) is arbitrary for all x G X and j3 is also arbitrary, an 
approximation factor of (3 + 2/3) can be achieved. 

Proof. We have seen w{ej) < OPT and c{I{H^^)) < c{Pj) < k. By the same 
arguments for Theorem 5, we can have that at least q clients are assigned to 
each s G to satisfy the supplier’s minimum quantity commitment. Fur- 

ther to this, for each y G F, if y is dominated by a supplier v G 
we have w{y,a{y)) < w{ej). Otherwise, we have {pj{u),a{y)) G and 

w{a{y)) < w{pj{u)), where {y,u) in Hj. Thus, both Pj{u) and a{y) must domi- 
nate a common client, v G Y, and both y and u must be dominated by a common 
supplier, r G Qj. However, only w{<j{y)) < w{pj{u)) can be ensured here, we have 
ct ( 2 /, o -( j /)) < {d{y,r) + d{u,r) + d{u,pj{u)) + d{v,pj{u)) + d{v,a{y)))w{a{y)) < 
(2 + 3(3)w{ej), which completes the proof. □ 

Finally, assume X = Y equal to V. Under this assumption, we show that 
Algorithm 6 provides an approximation factor of 4 for cases with arbitrary c{x) 
for X G X and with /3 = 1. The algorithm is extended from Algorithm 4, 
but shifts vertices of I{Ti) to their cheapest suppliers in Gi, instead, to form 
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Algorithm 5 (When X = V = V, c(x) = 1 for x G X and (3 is arbitrary) 

1: Sort edges so that w{ei) < w(e2) < ... < w(em), and construct Gi, G2, ■■■, Gm’, 

2: Let Qi — {v\dom{v) > g in Gi,v € X}, for 1 < i < m; 

3: For 1 < i < m, construct an undirected graph Ti = where Ef contains 

an edge (u, v) for any two u,v G Qi if and only if a vertex r G V exists so that 
either both (u, r) and (v,r) are in Ei, or both (r,u) and (r,v) are in Ei; 

4: Call Algorithm 2 to obtain I{Ti) for 1 < i < m; 

5: Find the threshold j that is the smallest index i so that Qi dominates all vertices 
of V in Gi and \I(Ti)\ < fc; 

6: Let S = {s'(u)|u € I{Tj)}, where s'{v) is the lightest-weight supplier among those 
suppliers in V that dominates v in Gj\ 

7: for each y G V do 

8: if y is dominated by a vertex v G I{Tj) in Gj then 

9: Assign y to cr(j/) = s'(u); 

10: else 

11: Let u be the vertex in Qj with (u, y) G Ej. 

12: Let v' be the vertex in I{Tj) with (u,v') € EJ' and w{v') < w{u)', 

13: Assign y to a{y) = s'(v') ; 

14: end if 

15: end for 

16: Return S and cr(j/) for y gY. 

P' = {pi{v)\v G I{Ti)} for 1 < z < TO. Accordingly, the threshold, j, is defined 
to be the smallest index, z, such that Qi dominates all vertices of V in Gi and 
\c{Pi)\ < k. This satisfies the budget and ensures d{ej) < OPT. 

The assignment of each y G V is almost the same, except that we replace v 
and v' in Algorithm 4 by Pj{v) and Pj(v') in Algorithm 6. To show its approxi- 
mation factor of 4, let us prove the following theorem. 

Theorem 10. If X = Y , when c{x) is arbitrary for all x G X except (3=1, 
Algorithm 6 achieves an approximation factor of 4- 

Proof. We have seen that c(Pj) < k and d{ej) < OPT. By the same arguments 
for Theorem 4, we know that the minimum quantity commitment must be sat- 
isfied. Besides that, for each y G Y, suppose a{y) = Pj{t) where t G I{Tj). If y 
is dominated by t, we have d{y,a{y)) < d{y,t) + d{t,pj{t)) < 2d{ej). Otherwise, 
there exists a vertex, u G Qj, with (u,y) G Gj and {u,t) G Tj, which leads a 
vertex r gV with either (t,r) and (zz,r) in Ej, or (r, zz) and (r,v) in Ei. Both 
lead d{y, <j{y)) < d{y, u) + d{u, r) + d{r, f) + d{t,pj{t)) < 4d{ej). □ 

5 Conclusion 

We studied a new bottleneck problem that ensures a minimum quantity com- 
mitment of each selected supplier. The novel constraint is motivated from a 
stipulation from the US Federal Maritime Commission to balance workloads 
of suppliers. We provided a polynomial algorithm to achieve its best possible 
approximation factor. We also presented in-approximability results and approx- 
imation algorithms for its generalizations by considering the vertex weight and 
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Algorithm 6 When X = Y =V , c(x) is arbitrary for x G A and /3 = 1) 

1: Sort edges so that d(ei) < d(c 2 ) < ... < d{em) and construct Gi,G 2 , ...jGm', 

2: Let Qi = {v\dom{v) > g in Gi,v € V}, for 1 < i < m; 

3: For 1 < i < m, construct a graph, Ti = {Qi, EJ), where EJ contains an edge (u, v) 
for any two u,v £ Qi if and only if a vertex r £V exists so that either both {u, r) 
and (v,r) are in Ei, or both (r,u) and (r,v) are in Ei\ 

4: Let P/ = {pi{v)\v £ /(Ti)} for 1 < i < m; 

5: Find the threshold, j, that is the smallest index, i, such that Qi dominates all 
vertices of V in Gi and c(P/)| < k; 

6: for each y £ V do 

7: if j/ is dominated by a vertex v £ I{Tj) then 

8: Assign y to a{y) = pj{v)-, 

9: else 

10: Let u be the vertex in Qj with (u, y) £ Gj. 

11: Let v' be the vertex in I{Tj) with {u,v') £ Tj\ 

12: Assign y to o{y) = pj{v') ; 

13: end if 

14: end for 

15: Return Pj and (j{y) for y £ V. 

the vertex cost. The approximation factors were proved to be near the best pos- 
sible ones. Future work on this problem could include the consideration of the 
center capacities. 
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Abstract. We consider the problem of assigning jobs to m identical ma- 
chines. The load of a machine is the sum of the weights of jobs assigned 
to it. The goal is to minimize the norm of the resulting load vector. It 
is known that for any fixed norm there is a PTAS. On the other hand, 
it is also known that there is no single assignment which is optimal for 
all norms. We show that there exists one assignment which simultane- 
ously guarantees a 1.388-approximation of the optimal assignments for 
all norms. This improves the 1.5 approximation given by Chandra and 
Wong in 1975. 



1 Introduction 

The problem of machine scheduling is one of the most researched problems in 
the area of approximation algorithms. The identical machines model is defined 
by m parallel machines and n independent jobs, where each job j has a non- 
negative weight Wj. Each job should be assigned to one of the machines, and 
the load of each machine i, k, is defined as the sum of weights of all jobs 
assigned to it. The goal of the problem is to get the best assignment. For 
each specific norm £p {p > 1), this is defined as the assignment that mini- 
mizes \\{li, . . . ,lrn)\\p = Specifically, for the case of the ^oo norm 

(makespan), the goal is to minimize the maximum load of all the machines. In 
this paper we describe an algorithm that finds an assignment that simultane- 
ously provides a good approximation for all the optimal assignments of all the 
ip norms. 

It is well known that for any specific norm, one can find for every £ > 0 a 
polynomial time algorithm for the problem that provides an approximation ratio 
of 1 -I- £ (PTAS). Hochbaum and Shmoys [11] showed that there is a PTAS of 
minimizing the makespan (^oo)- Later, it was shown in [1], that there is a PTAS 
for every ip norm. However, this does not mean that for every positive e there 
exists an assignment that approximates the optimal assignment of various norms 
simultaneously, up to a factor of 1 -I- £. Actually, in the same paper there is an 
example that shows that optimal solutions for the £2 and ioo norms are achieved 
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from different assignments. In the full version we generalize this example, and 
prove that for every two different norms we can find an input for which the 
optimal assignment of the two norms differ. By that we conclude that in general 
there is no assignment that 1 -I- £ approximates the optimal assignments of both 
norms, for small enough e. Moreover, for any set of different £p norms there is 
an example in which there is a different optimal assignment for each norm of the 
set. 

Now that we know that we cannot find an approximation scheme (not nec- 
essarily polynomial time) even for two different norms simultaneously, we want 
to find an algorithm which finds an assignment that approximates the optimal 
assignments of all £p norms simultaneously, with a small constant approximation 
ratio. 

Chandra and Wong showed in [4], that the algorithm that sorts the jobs from 
the biggest weight to the smallest, and then sequentially greedily assigns each 
job to the least loaded machine, gives an approximation ratio of 1.5 for all norms 
simultaneously. In particular, this algorithm gives an approximation ratio of | 
for the £oo norm (see [8]), and 1.021 for the £2 norm. 

Our results: Our main result is a polynomial time algorithm that provides an 
assignment that approximates the optimal assignment of all norms simultane- 
ously within a factor of 1.3875. This improves the 1.5 approximation ratio given 
in [4]. 

As mentioned above we also show (in the full version) that for any two norms 
£p and £q there exists an input that has two different optimal assignments for 
both norms, and moreover, for any set of norms £p^,£p^, . . . , £p^ there is an input 
for which all the optimal assignments of all these norms differ. This proves that 
an approximation scheme (not necessarily polynomial time) for the problem of 
approximating two out of several norms simultaneously, does not exist. 

Other related results: Goel et al. [7] introduced the definition of globally 
a-balance. Goel and Meyerson [6] showed that an assignment which is globally 
Qf-balanced (defined later) also a-approximates all optimal assignments for all 
norms (as well as a-approximates the optimal assignments for every convex 
function on the loads of the machines). They also considered the problem of 
finding a globally a-balanced assignment for identical machines. They showed 
how to find a PTAS for an assignment that is globally a-balanced with the best 
(smallest) value of a, but did not give a bound of this a. In this paper we show 
that this a is actually bounded by 1.3875. 

Other scheduling models have also been studied: related machines, restricted 
assignment (subset model) and unrelated machines. In the related machines 
model (i.e. machines have fixed speeds) there is no assignment that approximates 
all norms simultaneously within a constant factor [3]. This obviously holds for 
the unrelated machines model as well. The same paper also shows an algorithm 
that simultaneously 2-approximates the optimal solutions of all norms in the 
restricted assignment model (i.e. each job arrives with a set of machines that it 
can be assigned to). We note that more is known for approximating any fixed 
norm. For the related machines model a PTAS was given by Hochbaum and 
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Shmoys [10] for l^o norm and by Epstein and Sgall [5] for any fixed Ip norm. 
For the restricted assignment model 2-approximation was achieved by Lenstra 
et al. [13] for loo norm and by [3] for any other ^p norm. Moreover, in these 
papers it was also shown that a PTAS does not exist for as well as for any ^p 
norm (p > 1). In the unrelated machines model 2-approximation algorithm was 
achieved by [13] for and a 6 (p) approximation ratio for any other £p norm [2] 
(see [12] and [14] for other related results). 

Paper structure: In section 2 we repeat the definition of globally a-balance 
taken from [7] and explain its connection to the approximation of all norms 
showed in [6]. We also describe a tool used by the algorithm. In section 3 we 
describe the all-norm approximation algorithm. In subsection 3.1 we show how to 
easily handle the huge jobs, i.e., jobs which are larger than the average load on the 
machines. In subsection 3.2 we show how to assign the small jobs (defined later) 
after assigning the big ones without increasing the imbalance. In subsection 3.3 
we show how to find a balanced assignment for the big jobs and by that complete 
the algorithm and its proof. 

2 Definitions and Observations 

We use the definition of globally a-balance used in [7], to prove the all-norm ap- 
proximation. We will briefly repeat some definitions and a theorem that explain 
the importance of this property. Let Sk{x) denote the sum of the loads of the k 
most loaded machines in the assignment x, for 1 < k < m. 

Definition 1 . For a > 1, given two assignments x and y (not necessarily of the 
same jobs), we say that x is a-suhmajorized by y, if for every k (1 < k < m), 
Sk{x) < aSk{y)- This will be denoted by x <a y- 

Definition 2 . Assignment P is called globally a-balanced if for any other fea- 
sible assignment P' of the same jobs, we have P <„ P' . 

The next theorem will define our way of proving an all-norm approximation. 
The proof of the theorem is based on the basic theorem of Hardy et al. [9] (see 
[ 6 ]). 

Theorem 1 . If an assignment is globally a-balanced, then it a-approximates the 
optimal assignment of all ip norms (p > \). 

The way to prove that an assignment P a-approximates the optimal as- 
signment of each ip norm, is to pick any other assignment P' , and prove that 
P <a P' ■ A useful tool used by our algorithm is separating the problem into 
smaller problems. For that we define the union of assignments. 

Definition 3 . Given an assignment P\ of n\ jobs on m\ machines, and an 
assignment P2 of U2 jobs on m2 machines, the (disjoint) union of Pi and P2 
(denoted by Pi U P2) is the assignment on mi -\- m2 machines, that assigns the 
jobs from Pi on mi machines as Pi, and the U2 jobs from P2 on the m2 
machines as P2- 
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It is easy to see that given two assignments, P\ on toi machines, and P 2 on m 2 
machines, we have 

Sk{Pi) + Si{P 2 ) < Sk+i{Pi u P2) ( 1 ) 

for every k, \ < k < m\ and I, 1 < I < m 2 - 

Lemma 1. Let P\, Q\ he two different assignments on m\ machines (not neces- 
sarily consisting of the same jobs), and let P 2 , Q 2 he two different assignments 
on m 2 machines. If Pi <« Qi and P 2 <a Q 2 , then the assignment which is 
the union of P\ and P 2 on the m\ m 2 machines is a-suhmajorized by the 
assignment which is the union of Qi and Q2- 

Proof. Consider the k most loaded machines in the union of Pi and P2. They 
include the most loaded machines in Pi and P2 . Assume they include I machines 
from Pi and k — I machines from P2. Then for every k, 1 < k < mi m 2 we 
have 

Sk(Pi U P 2 ) = Si{Pi) Sk-i{P2) < 0iSi{Qi) -h aSk-i{Q2) < otSk{Qi U Q 2 ) 

where the last inequality follows from (1). □ 

From the lemma above we can conclude by induction the following lemma: 

Lemma 2. Let Pi, Qi he two different assignments on mi machines, for 1 < 
i<k (not necessarily consisting of the same jobs). If Pi <« Qi for every i, then 
Pi <a ^i=iQi on the X)i=i machines. 

3 All Norm Approximation 

Our algorithm consists of 4 phases. In the first phase, subsection 3.1, we eliminate 
the huge jobs (jobs of weight larger than the average load of the machines). In the 
second phase, subsection 3.2, we eliminate the small jobs (jobs of weight smaller 
than some constant fraction of the average load of the machines). In the third 
phase we repeat the first phase for the new huge jobs created by eliminating the 
small jobs in the second phase. Now, we are left only with big jobs, i.e., jobs 
which are neither huge nor small. In the forth phase, subsection 3.3, we solve the 
problem for the big jobs. This yields the main result concluded in Theorem 5 
which states that our algorithm produces a globally 1.3875-balanced assignment. 

3.1 Handling Huge Jobs 

We apply normalization on the weights by dividing each of the weights by 

^'(0 — -. Then we get X)r=i ~ average load over all machines 

is exactly 1. 

Definition 4. An assignment P is called ’’reasonable” if by removing any job 
from the machine it was assigned to, the load of that machine becomes smaller 
than 1. 
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Lemma 3. If assignment P is not reasonable, then there exists an assignment 
P' , such that P' <1 P. 

Proof. If P is not reasonable, then there exists a machine A and a job j on A, 
such that if we remove j from A, the total load of A is still not smaller than 1. 
We build the assignment P' by assigning j to a machine B whose load is less 
than 1 (such a machine must exist). The other jobs will be assigned in P' as 
they were assigned in P. Clearly, P' <i P since the total load of A and B is the 
same in P and P' , each of the machines A and B in P' have smaller loads than 
the machine A in P, and all the other machines are unchanged. 

If P' is reasonable, we are done. If not, we will continue this process with 
P' until we get a reasonable assignment. This process must end since we have a 
finite number of possible assignments, and in every step, the sum of squares of 
the loads of the machines is reduced. □ 

From this lemma it is clear that if we want to prove that an assignment 
P gives an approximation of a for all norms, it is enough to pick any other 
reasonable assignment P', and prove that P <a P' ■ 

Clearly, in any reasonable assignment, a job whose weight is at least 1 is 
assigned to a machine by itself. Therefore, our algorithm has the following struc- 
ture: 

all-norm algorithm(preliminary version) 

1. Normalize weights to get an average load of 1. 

2. While there are jobs of weight > 1 (’’huge jobs”) do 

a) Assign each of these jobs individually to a machine and delete these jobs 
and machines. 

b) Renormalize weights with the remaining jobs and the remaining ma- 
chines. 

3. Handle the remaining jobs and the remaining machines. 

4. Insert the jobs and the machines that were deleted in step 2a. 

Lemma 4. If there exists an algorithm for jobs of weight smaller than 1 (where 
1 is the average load of the machines) , that provides an assignment which is 
globally a-balanced, then there is an algorithm that provides an assignment which 
is globally a-balanced for any input. 

Proof. Use the above algorithm, where you plug into step 3 the algorithm that 
handles jobs of weight smaller than 1. It is easy to prove that the above algo- 
rithm provides an assignment which is globally a-balanced using Lemma 3 and 
Lemma 2. This is done inductively over the iterations of step 2 of above algo- 
rithm. The details appear in the full version. □ 

3.2 Handling Small Jobs 

We first show that given n jobs, we can separate the jobs into big jobs and small 
jobs, so that if we could find a good assignment for the big jobs, we could easily 
add the small jobs without damaging the balance of the assignment. 
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Theorem 2. Given n jobs, whose average load is normalized to 1, if there exists 
an assignment P of all jobs whose weights are bigger than (3 (Q < (3 < 1) , which 
is globally (1-1- (3) -balanced, then adding the small jobs sequentially greedily in 
any order (each job on the current least loaded machine) also creates a globally 
(1 -h (3) -balanced assignment. 

Proof. Let {Pi, P 2 , . . . , Pm) be the load vector of the machines ordered in non- 
increasing order in the assignment P of the big jobs (the jobs whose weights 
are bigger than )3). Let Q be any other assignment of the big jobs, and let 
{Qi,Q 2 , ■ ■ ■ ,Qm) be the load vector of the machines in Q, ordered in non- 
increasing order. Then from the assumption of the theorem, 



p <i+/3 Q . (2) 

Let P' be the assignment created by adding the small jobs to P sequentially 
greedily, and let (P{, • • • > PD be the load vector of the machines in P' , or- 

dered in non-increasing order. Let Q' be the assignment created by adding the 
small jobs to Q in an arbitrary way, and let {Q[,Q' 2 , . . . , QD be the load vector 
of the machines in Q', ordered in non-increasing order. Note that Q' stands for 
an arbitrary assignment. Clearly, for every k, 1 < k < m: 

Qk < Q'k ■ (3) 

Let I {I <m) be the largest integer such that Pk = P). for every fc < ^ (if there 
is no such I, we define 1 = 0). Then for each k < I we have: 

k k k k 

Sk(p') = E ^' = E ^ (1 + /^) E ^ (1 + /^) E = (1 + /3)^fe(Q') (4) 

i—l i—1 2=1 2=1 

where the first inequality follows from (2) and the second one follows from (3). 

Note that in the m — I least loaded machines in P', the difference between 
the loads of the most loaded machine and the least loaded machine is at most (3 
(otherwise, the last job that was greedily assigned to the most loaded machine, 
should not have been assigned to it, since it was not the least loaded machine at 
that moment). Hence, the difference between each of the loads of these machines 
and their average load is at most (3: 



P'<P + 



^T=i+iPl 



— /? + 



</3 + 



for every j, / + 1 < j < to. The equality holds since the sum of all the weights in 
P' is normalized to be to, and the second inequality follows from the definition 
of 1. Note also that for any 1 < k < m — I, 
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where the inequality is follow from the fact that the sum of the k biggest values 
in the vector QJ+27 • ■ • 7 Q'm) is not smaller than k times the average value 

in this vector. The equality is true since the sum of all the weights in Q' is 
normalized to m. 

Now for every k {0 < k < m — 1), we will compare the sums of the I + k 
most loaded machines in both assignments. The sum of the I + k most loaded 
machines in our assignment is: 



k+l 



l+k 



5fe+i(p') = E^* = E^/+ E Pi 

i^i i^i 2=1+1 

' m — I 



= 1- 



m — I 



E ■ 



V m — i J m — 



km 
m — I 

( 7 ) 



The first inequality follows from (5) and the second inequality follows from (2). 
The sum of the I + k most loaded machines in the other assignment is: 



k+l I l+k 

s,+q') = j2q' = j2^'+ e q'. 

2=1 i=l i=l+l 



= 1 



> 1- 



i) E^* 



km 
m — I 




km 
m — I 



(8) 



where the first inequality follows from (6) and the second one follows from (3). 

From (4), (7) and (8) we get Sk(P') < {1 + P)Sk{Q') for every k, I < k < m. 
Since any assignment can be composed from an assignment Q of the big jobs by 
adding the small jobs in some way, we conclude that our algorithm provides a 
globally (1 + /3)-balanced assignment. □ 



By the above theorem, in order to get an assignment which is (1+/3) -balanced, 
our algorithm is defined as follows: 

all-norm algorithm 

1. Normalize weights to get an average load of 1. 

2. While there are jobs of weight > 1 (’’huge jobs”) do: 

a) Assign each of these jobs individually to a machine and delete these jobs 
and machines. 

b) Renormalize weights with the remaining jobs and the remaining ma- 
chines. 

3. Put the jobs of weight smaller than f3 (’’small jobs”) aside. 
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4. Renormalize weights to get an average load of 1. 

5. While there are jobs of weight > 1 (’’huge jobs”) do: 

a) Assign each of these jobs individually to a machine and delete these jobs 
and machines. 

b) Renormalize weights with the remaining jobs and the remaining ma- 
chines. 

6. Handle the jobs of weight between [3 and 1 (’’big jobs”), as will be described 
later (subsection 3.3). 

7. Insert the jobs and the machines that were deleted in step 5a. 

8. Add the small jobs greedily sequentially. 

9. Insert the jobs and the machines that were deleted in step 2a. 

Note that after each of the steps 3 and 5a the average load of the machines 
becomes smaller, and therefore after renormalization, there will be no jobs of 
weight smaller than f3. Hence, step 3 should only be done once in order to get 
jobs of weight between f3 and 1, as required by step 6. Clearly, we have the 
following theorem: 

Theorem 3. If there exists an algorithm that provides an assignment which is 
globally (1 -I- j3)-balanced for an input which consists of jobs of weight between (3 
and 1 (where 1 is the average load of the machines), then there is an algorithm 
that provides an assignment which is globally (1 -I- (3) -balanced for any input. 

Proof. Use the above algorithm, where you plug into step 6 the algorithm that 
handles jobs of weight between (3 and 1. From Lemma 4 and Theorem 2, this 
algorithm provides an assignment which is globally (1 + /3)-balanced. □ 

Remark: Actually step 2a of the above algorithm may be omitted since 
Theorem 2 holds even if there are jobs of weight larger than 1. However, it is 
more natural to keep step 2a, since the assignments done in step 2a are part of 
every reasonable assignment. 



3.3 Handling Big Jobs 

In this subsection we will show how to handle jobs of weight between (3 and 1 
(step 6 of the all-norm algorithm). 



Treating a small number of big jobs. At first we show a specific case where 
the number of big jobs is at most 2m. Since all jobs are of weight at most 1, we 
have at least m jobs. Let the number of jobs be 2m — k, for some 0 < k < m. 

Definition 5. Given 2m — k jobs (for a given k, 0 < k < m) of arbitrary weight 
(not necessarily at least (3), ’’the snake assignment” assigns each of the k largest 
jobs to a separate machine, and for every i (1 < i < m — k) assigns the i k’th 
largest job and the i’th smallest job to a separate machine. 
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Lemma 5. For the case where there are at most 2m jobs, “the snake assign- 
ment” S, is 1-submajorized by any assignment which does not assign more than 
two jobs to each machine, and therefore is optimal in all norms. 

Proof. We will first show that it is enough to prove the lemma for the case where 
the number of jobs is exactly 2m. In the general case there are 2m — I jobs, where 
I < I < m. Any instance of 2m — I jobs can be transformed into an instance of 
2m jobs by adding I zero weighted jobs without changing the loads (the lemma 
holds for arbitrary weights, including zero). Moreover, this does not affect the 
snake assignment. 

Consider the case where the number of jobs is exactly 2m. Here all the jobs 
are divided into pairs, and each pair is assigned to a machine separately. The 
proof is by induction on m. For m = 1 the claim is trivial. We will assume that 
the snake assignment is optimal for m = k and prove it for m = k 1. Let R be 
an arbitrary assignment of these 2{k + 1) jobs to fc + 1 machines by pairs. We 
want to prove that S <i R. We will build an intermediate assignment Ir and 
prove S <i Ir <i R. 

If in R the biggest job is assigned to the same machine as the smallest job, we 
will define Ir to be R. Otherwise, we will look at the biggest job, whose weight 
is denoted by wi, and assume that in R it is assigned to a machine denoted by 
A with a job whose weight is denoted by W2. The smallest job, whose weight is 
denoted by W3, is assigned in i? to a machine denoted by B with a job whose 
weight is denoted by w^. In Ir, the job of weight Wi will be assigned to A with 
the job of weight W3 and the job of weight W2 will be assigned to B with the job 
of weight W4. The assignments to the other machines will be left unchanged. Let 
P\ be the assignment Irou AUB, and let P2 be the assignment Ir on the other 
machines. Let Qi be the assignment R on AU B, and let Q2 be the assignment 
R on the other machines. Since + iCs < + W2 and W2 W4 < Wi W2, we 

have Pi <1 Qi. Clearly, P2 <1 Q2, and by Lemma 1 we conclude that Ir <1 R- 
Next we show that S <1 Ir. In Ir the biggest job is assigned to the same 
machine as the smallest. The assignment to this machine will be denoted by Qi- 
The other jobs are assigned to the other machines arbitrarily. The assignment 
to the other k machines will be denoted by Q2 . In S, the biggest job is assigned 
to the same machine as the smallest. The assignment to this machine will be 
denoted by Pi. The other jobs are assigned to the other machines by the snake 
assignment. We will denote the assignment of S to the other k machines by P2. 
Clearly, Pi <1 Qi. From the induction assumption on k machines, P2 <1 Q2, 
and from Lemma 1, S <1 Ir, and therefore S <1 R. This concludes the case for 
exactly 2m jobs. □ 



Theorem 4. The snake assignment S is globally ^-balanced when there are no 
more than 2m jobs, each of weight between | and 1. 

Proof. Let P be another reasonable assignment. We need to prove that S <± P. 
We will use an intermediate assignment Ip, and prove that S <i Ip <4 P. At 
first we notice that in P there is no machine with more than three jobs assigned 
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to it (since P is reasonable and the weights of the jobs are at least ^). The 
algorithm to create the intermediate assignment is defined by: 

1. Initialize i := 1. 

2. While there is a machine with three jobs assigned to it do: 

a) Find such a machine, denoted by Ai. 

b) Find another machine, denoted by Bi, with only one job assigned to it 
(clearly there is such a machine). 

c) Take the smallest job from Ai, whose weight is denoted by Xi, and assign 
it to Bi- 

d) i := i + 1 . 

The output of this algorithm is the intermediate assignment Ip. Note that 
all the Ai’s and Bi’s are distinct, and therefore the process stops after at most 
m steps. 

From the previous lemma it is clear that S <i Ip, since S is optimal among all 
assignments that do not assign more than two jobs to a machine. 

We only have to prove now that Ip <± P. This will be done by observing 
step i of the algorithm for creating Ip. Let Pi be the assignment P on AiUBi and 
let Ipi be the assignment Ip on Ai U Bi. We want to prove that Ipi <4 Pi. Of 
course, once the sum of the loads in both assignments is the same, we only have to 
compare the machine whose load is the bigger of the two. In Pi the machine with 
the biggest load is the one with the three jobs (since all the weight are between 
i and 1). If the most loaded machine in Ipi is the one that had the three jobs 
before the transformation, then Ipi <\ Pi. Otherwise, the most loaded machine 
in I Pi is the one that had one job assigned to it before the transformation. 
Consider the job of weight Xi. In Pi it was assigned to a machine whose load 
was no less than In Jpj it is assigned to a machine whose load is no more 
than 1 + Xi. The ratio between the loads of these machines is (which is not 
more than |, since Xi > ^). Then Ipi <4 Pi, and this is true for every Ai,Bi. 
Suppose the algorithm stopped after k iterations, then by Lemma 2, since all 
Ai’s and Bi’s are distinct, <i Ui^^Pi. Denote the set of machines not 

changed by the algorithm by J. Clearly, P and Ip are the same on J. Let Pj be 
the assignment P on J, and let Ipj be the assignment Ip on J. Then Ipj <i Pj. 
From Lemma 1 we have: /p = (ujL;^/pi) U/p^ <4 UPj = F. Therefore, 

S <i P. □ 

3 



Treating big jobs - the general case. Recall that all jobs are between (3 and 
1, and assume that \ < (3 < \. Then no reasonable assignment has more than 
three jobs on one machine. We may also assume that there are more than 2m 
jobs. Otherwise, we have an assignment which is globally |-balanced. So there 
are 2m -I- k jobs to assign, where 1 < k < m. 

Lemma 6. If three jobs are assigned to one machine in a reasonable assignment, 
then none of them has a weight bigger than 1 — j3. 
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Proof. If the lemma is not true, then one of the jobs has a weight bigger than 
I — p. So there are two jobs with a total weight bigger than 1, which is not a 
reasonable assignment. □ 

For the same reason the next claim is also true: 

Lemma 7. If three jobs are assigned to one machine in a reasonable assignment, 
then at most one job has a weight bigger than 0.5. 

Lemma 8. There are at least 3k jobs of weight smaller than 0.5. 

Proof. Suppose there are less than 3fc jobs of weight smaller than 0.5. Then there 
are at least 2m—2k+l jobs of weight bigger than 0.5. The other jobs have weight 
of at least /3, and the sum of all weights of all jobs > 0.5(2m— 2fc+l)+/3(3fc— 1) > 
m — k + 0.5 + k — I > m. This is a contradiction to the fact that the sum of all 
weights is TO. □ 

The ’’big jobs” algorithm: (step 6 of the all-norm algorithm) 

1. Initialize pool of jobs to include all big jobs. 

2. Do k times: 

a) Take the job of biggest weight smaller than 1 — P from the pool of jobs 
and assign it to a new machine. 

b) Take the two jobs of biggest weight smaller than 0.5 from the pool of 
jobs and assign them both to the machine used in the preceding Step 
(a). 

3. Now assign the remaining 2m — 2k jobs to the remaining m — k machines by 
the snake assignment. 

Lemma 9. The ’’big jobs” algorithm provides an assignment which is globally 
^)-balanced, for ^ < P < 5. 

Proof. As can be seen above, the algorithm takes the largest possible triplets 
in any reasonable assignment, leaving the small jobs to the pairs. Let P be the 
assignment created by the algorithm and let Pi {1 < i < k) be the assignment 
to one machine created by the iteration i of the loop in the algorithm. Let Prem 
be the assignment of the algorithm to the remaining machines. 

Every reasonable assignment must have at least k triplets of jobs assigned to 
one machine. Let Q be any reasonable assignment, and let Qi {1 < i < k) be 
any k different sub-assignments of Q to one machine, each machine having a 
triplet assigned to it. Let Qrem be the sub-assignment of Q to the remaining 
machines. So by comparing each Pi to Qi, we get a ratio of 
By Lemma 2, we get ^Ji=iQi- If we now compare the rest of the 

m — k machines we will see that in our assignment these machines include the 
smallest jobs possible (since the largest possible jobs went to the triplets). In 
any other reasonable assignment these m — k machines will include these jobs 
or bigger ones (bigger in the sense of comparing the values in the vector of the 
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sorted weights one by one). Even if we assume that the other assignment has the 
same jobs as ours on these machines (the worst case), then we have the same 
2m — 2k jobs on m — k machines. By Theorem 4, the snake assignment is globally 
|-balanced, and therefore Prem <± Qrem- 

3 

Now by Lemma 1 we get: 

P = (utiP,) U Prem U Qrem = Q 

which completes the proof. □ 

Finally , we conclude our main result. 

Theorem 5. The all-norm algorithm produces an assignment which is globally 
1 -I- [3-halanced, for (3 = « 0.3875. 

Proof. By Lemma 9 we have an algorithm that assigns jobs of weight between /3 
and 1 (where 1 is the average load of the machines), and provides an assignment 
which is globally |)-balanced, for | < /3 < |. We can plug in the 

’’big jobs” algorithm into step 6 of the all-norm algorithm, and by Theorem 3 
we can return to the original input, and get an assignment which is globally 
|) 1 + /3)-balanced. 

If we choose = 1 -I- /3 (/? = ^^~^ ), we get an algorithm that provides an 
assignment which is glob ally- 1.3 8 75 balanced. In particular, this algorithm has 
an approximation ratio of 1.3875 for all (.p norms simultaneously. □ 
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Abstract. We propose an approximation algorithm for the general max- 
min resource sharing problem with M nonnegative concave constraints 
on a convex set B. The algorithm is based on a Lagrangian decomposition 
method and it uses a c - approximation algorithm (called approximate 
block solver) for a simpler maximization problem over the convex set B. 
We show that our algorithm achieves within 0(M(lnM lne“^)) it- 
erations or calls to the approximate block solver a solution for the general 
max-min resource sharing problem with approximation ratio c/(l — e). 
The algorithm is faster and simpler than the previous known approxi- 
mation algorithms for the problem. 



1 Introduction 

We consider the following general max-min resource sharing problem 

(R) X* = max { A | f{x) > Xe, x £ B }, 

where e is the vector of all ones and f : B ^ is a vector with M nonnegative, 
continuous, concave functions fm defined on a nonempty convex compact set B 
(called Mock). Without loss of generality we might assume that A* > 0 and 
M > 2. Let A(/(x)) = mini<m<M /m(a;) for any given vector x & B.li all 
functions fm are linear, the special case is called the general fractional covering 
problem. 

In this paper we present an efficient algorithm which approximately solves 
{R) by using the Lagrangian or price directive decomposition method. Let 
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P = {p G — 0} be the set of price vectors. This method 

associates a block problem to each instance of (R): 

A{p) = max { p^f{x) \ x G B } 

where p G P. Solving the block problem means maximizing a concave function 
over the convex set B. We suppose that for any price vector p G P there is a (c, t) 
- approximate block solver ABS{p,c,t) which computes a vector x = x{p) G B 
such that p'^f{x) > ^(1 — t)A{p) where t = 0(e). ABS{p,c,t) defines a family 
of approximation algorithms with ratio close to c > 1. 

In this paper we study the problem to find a (c, e)-approximate solution for 
(R). The main result is an approximation algorithm that for any values c > 1 
and e G (0, 1) finds a solution for the following problem: 

(i?c,e) compute xGB such that f{x) > [^(1 — e)A*]e. 

Our algorithm requires only a small number of calls to the approximate block 
solver. The number of calls is called the coordination complexity. Our algorithm 
finds a solution of value at least (l/c)(l — e)A*, provided that there is an ap- 
proximate block solver for t = 0(e) and any p G P. The coordination complexity 
or number of calls is at most 0{M{ln M + e“^lne“^)). Our new algorithm is 
faster and simpler than the previous known algorithms for the general max-min 
resource sharing and fractional covering problem. 

Previous Results. Plotkin et al. [18] considered the linear feasibility vari- 
ant of the fractional covering problem (with linear functions and approximation 
ratio c = 1): to find a point x G B such that f{x) = Ax > (1 — e)b where A is an 
(MxN) matrix and b is an M-dimensional positive vector. The problem is solved 
in [18] by Lagrangian decomposition using exponential potential reductions. The 
number of iterations (calls to the corresponding block solver) in this algorithm 
is 0{M + pln^ M + e~^pln{Me~^)), where p = ma,xi<rn<M a^x/bm 

is the width of B relative to Ax > b. Konemann [15] proposed also an algo- 
rithm for the fractional covering problem that uses 0{M plogl_^_^{e~^)) itera- 
tions where p is also a data dependent bound. Young [21] studied the frac- 
tional covering problem with general approximate block solvers (arbitrary ratio 
c > 1). He proposed an algorithm that uses 0(cp' In M/ (A* e^)) calls to the 
block solver for the fractional covering problem, where p' = maxi<m<M rnax^^ b 
a’^x/bm — mini<m<M a^x/bm and A* is the optimum value of the frac- 

tional covering problem, respectively. The first big step towards the general max- 
min resource sharing problem was done by Grigoriadis et al. [9]. They proposed 
an algorithm for this problem with standard block solvers (with approxima- 
tion ratio c = 1) that uses only 0(M(e“^ -l-lnM)) calls to the block solver — a 
bound that does not depend on the width p and the optimal value A*. Jansen and 
Porkolab [12] studied the general max-min resource sharing (and also fractional 
covering) problem with approximate block solvers (i.e. with concave functions 
fm and arbitrary ratio c > 1). They proposed an approximation algorithm that 
uses at most 0{M {In M + -I- e“^lnc)) iterations — a bound that depends 

on the approximation ratio c. In typical instances we have c = 0(1) (e.g. with a 
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constant approximation ratio c), c = 6*(ln(M)) or c = 0{M) (with a dependence 
on the number of constraints M). Recently we [11] found an approximation al- 
gorithm that needs 0{Me~^ ln{Me~^)) iterations. This algorithm has to store 
0{M) solutions in each scaling phase and it computes a convex combinations of 
these solutions at the end of each phase. 

Related problems. A closely related problem is the fractional packing prob- 
lem [4,6,18,21] and min-max resource sharing problem [7,8,13,20]. Grigoriadis 
and Khachiyan [8] studied the min-max resource sharing problem (the gener- 
alization of the fractional packing problem with convex functions) and proved 
that this problem (with ratio c = 1) can be solved in 0(M(e“^ In -I- InM)) 
iterations (or calls to a block solver). The method uses either the exponential or 
standard logarithmic potential function [7,8] and is based on a ternary search 
procedure that repeatedly scales the functions fm- Villavicencio and Grigori- 
adis [20] proposed a modified logarithmic potential function to avoid the scaling 
phases and to simplify the analysis. The number of iterations used in [20] is 
also In e~^ + InM)). Garg and Konemann [6] found an algorithm for 

the fractional packing problem (with c = 1) that needs 0{Me~^lnM) itera- 
tions. For a nice survey on the fractional packing problem and implementation 
issues we refer to [2]. Young [22] proposed an approximation algorithm for a 
mixed linear packing and covering problem (with linear functions, c = 1 and 
restricted block B = M)^) with running time 0{Mde~^\nM) where d is the 
maximum number of constraints in which any variable appears. Recently, Fleis- 
cher [5] gave an approximation scheme for the optimization variant (minimizing 
(Fx such that Cx > b, x < a and a; > 0 where o, b, and c are nonnegative 
integer vectors and (7 is a nonnegative integer matrix). A data and approxima- 
tion ratio independent coordination complexity of 0{M{\nM + e“^lne“^)) for 
the general min-max resource sharing problem was given by Jansen and Zhang 
[13]. Notice that the fractional covering problem is more complicated than the 
packing variant (since the underlying logarithmic potential function is bounded 
for the packing problem). 

New Results. In this paper we propose an approximation algorithm for 
the general max-min resource sharing problem that uses only 0{M{\nM + 
e“^lne“^)) iterations or calls to the approximate block solver. Each coordi- 
nation step requires a call to the approximate block solver ABS{p, c,t) where 
t = 0(e) and it incurs an overhead of 0(M In ln(Me“^)) arithmetic operations. 
This improves the coordination complexity of the previous known algorithms for 
the general max-min resource sharing problem. It matches the best bound for the 
general min-max resource sharing problem [8,13,20]. In addition, the algorithm 
is less complex than the previous one. The algorithm has not to store 0{M) 
solution in each scaling phase, and the computation of the convex combination 
at the end of the phase can be avoided. 

Applications. Applications of the max-min resource sharing problem can 
be found in [1,2,3,10,12,14,16,17,18,19,22]. Many combinatorial optimization 
problems (e.g. fractional bin packing, fractional strip packing, preemptive re- 
source constrained scheduling, preemptive malleable task scheduling, fractional 
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graph coloring, and fractional path coloring) can be modelled as a max-min 
resource sharing problem with an exponential number N of variables and a 
polynomial number M of constraints [1,2,3,10,12,14,18,19]. In these applica- 
tions the block problem (e.g. classical unbounded knapsack, s-dimensional knap- 
sack, multiple choice knapsack, weighted independent set and weighted edge 
disjoint path problem) is computationally hard to solve or to approximate. 
On the other hand there are approximation algorithms that solve the under- 
lying block problem and can be used to solve approximately the correspond- 
ing max-min resource sharing problem. The total running time of these algo- 
rithms is dominated by the number of iterations and the running time of the 
approximate block solver. Our result above implies an improved running time 
of 0{M{ln M + In e~^){BL{M,c,e) -|- Mlnln(Me“^)) for these applications 
where BL{M, c, e) is the running time of the block solver. 

Main Ideas. The algorithm is based on the duality 

A* = max min 0 ^ fix) = minmaxp^ ffa;). 

x&BpGP ^ ^ pePxGB ^ ' 

This implies that A* = min{A{p)\p G P} (the minimum objective value over 
all block problems). We use the Lagrangian decomposition or price directive 
method [7,8,18]. This method solves (R) approximately via its Lagrangian dual 
by computing a sequence of vectors xq,xi, . . . ,Xn & B. In one Lagrangian de- 
composition step we 

( 1 ) compute a price vector p = p{f{xi)) G P for the current vector Xi G B, 

(2) call a block solver as an oracle to get an approximate solution x G B of the 
block problem max{p^/(a:) [a; G B}, 

(3) and move from vector Xi to = (1 — r)xi + tx with an appropriate step 
length T G (0, 1). 

We use the logarithmic potential function 

M 

’^t{0J{x)) =ln6»-k ^ ^ Mfmix) -6) 

m—1 

to compute the price vector p{f{x)). The potential function has an unique 
maximum 9{f{x)) for each x G B. The reduced potential value (j)t{f{x)) = 
(!>{9{f{x)),f{x)) approximates the objective function \{f{x)). In addition the 
value 4>t{f{x)) measures the improvement of the solutions. In fact, one can prove 
that the reduced potential values are increasing: <j)t{f{xo)) < 4>t{f{xi)) < . . . < 
4>t{f{xn))- The algorithm is further based on the scaling phase strategy [9,18] 
where one improves the quality of the solutions in different phases s > 1. Since 
the potential function could be extremely large, the convergence of the general 
method depends on the approximation ratio c > 1. 

In order to speed up the convergence, we modify the logarithmic potential 
function above (see next section) and stepwise eliminate functions larger than a 
threshold value T{s). In a previous algorithm we had to store several solutions 
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during a phase and to compute a convex combination at the end of the phase [11] . 
To avoid this and to speed up the algorithm, we use the following ideas. First we 
give better lower and upper bounds for the reduced potential values (see Lemma 
2). These bounds are used to estimate the difference of the reduced potential 
values for two arbitrary vectors computed in one phase. Next we modify the step 
lengths r(s) and the threshold values T{s) that both depend on the phase s of the 
algorithm. Using these modified values we are able to show that (a) the reduced 
potential values are still increasing during the phases and (b) the function values 
of the final solution are large enough for any function (including 
the eliminated functions). 



2 Logarithmic Potential Function 

In this section we describe the modified potential function and give the defini- 
tion of the price vector p G P. Grigoriadis et al. [9] associated with the vec- 
tor X G B the standard logarithmic potential function f{x)) = ln0 -|- 

ig ln(/m(x) - 0), where 6» G M, f{x) = {fi{x), fnix)) is the function 

value for a; G i? and t > 0 is a tolerance that depends on e (i.e. t = e/6). In 
general the potential function can be extremely large, since there are no bounds 
for the function values fm{x). 



2.1 Modified Potential Function 



In order to bound the potential function and the number of iterations later, we 
modify the above potential function as follows. Let A be a nonempty subset 
of Ad = {1,...,M}, and let T be a threshold value (specified later). During a 
phase of the algorithm, we eliminate a component m in Ad, if the corresponding 
function value fm(x) is larger than the threshold value T. The index set of the 
noneliminated functions for a given vector x G B is specified by this subset 
A = A(x). In addition we replace the parameter t by t/8. Then, our modified 
potential function has the form: 

<Pt{0,f{x),A) = \n0+ ^^hi{fm{x)-e) + ^ ^ ln(T). 

m^A m£M.\A 

The function <Pt is well defined for 0 < 0 < X{f{x),A) where 
A(/(x), A) = min{/„(x) | m G A}. 



The potential function <Pt is used to determine the price vector p = p{f{x),A) 
in a very natural way (see Section 2.2). The maximizer 9 = 9(f(x), A), by the 
first order optimality condition, is the solution of the equation 



E 

m^A 



1 

fm{x) - 0 



to 



1 . 



( 1 ) 
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This equality has a unique root since the function g{6) = ^ 'l2meA f (x )-0 
is strictly increasing for 6 within the interval (0, A/i(/(a;))). We can prove the 
following bounds: 

Lemma 1. 

< X{f{x),A) < e{f{x),A){l + t/8). 

Lemma 1 shows that the minimum value 9{f{x), A) approximates A(/(x), A) 
for small values of t. For a given vector f{x) = (fi{x), . . . , fM{x)) of function 
values and a index set A we define the reduced potential function <j)t{f{x),A) = 
(Pt{9{f{x),A),f{x),A). The reduced potential function <pt{f{x),A) approximates 
the value ln9{f{x),A) and the objective value ln\{f{x),A). 

Lemma 2. Let fm{x) < T for each m G A. Then we have 

Mf(x),A) < ln9{f{x),A) + (t/8)ln(T), 

Mf{x),A) > {l + \A\t/8M))ln9{f{x),A) + \A\t/{8M)ln{\A\t/{8M))+ 

{M - \A\)t/{8M)ln{T). 

Remark: The bounds in Lemma 2 will be used later to bound the difference 
A{x')) — 4>tifix), Al(a;)) for two arbitrary computed vectors x and x' in 
a phase of the algorithm. 



2.2 The Price Vector 



The price vector p{f{x),A) = {pi{f{x),A), . . . ,pM{f{x),A)) for a vector x £ B 
and an index set A is defined from equation (1) as follows: 



Pm{f{x),A) 



t HfiAA) 

8M U{x)-e{fix),A) 

0 



for m £ A, 
otherwise. 



(2) 



The price vector p{f{x), A) is defined in a way to optimize only in the direc- 
tion of subset A: for m ^ A the price vector p^ifix), A) = 0. The idea behind 
this is the following: if a component fm{x) is larger than the threshold value 
T, then we should not optimize in this direction. According to equation (1), 
each price component Pm{f{x),A) is nonnegative and 'Yl!^=iPm{f{x), A) = 1. 
In our algorithm we simply compute p{f{x), A) from (2). We need the following 
property. 



Lemma 3. 



p{f{x),A)'^f{x) 



(1 -f t 



8M 



)9{f{x),A)<{l + t/8)9{f{x),A). 
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3 The Approximation Algorithm 

In this section we present an approximation algorithm for the general max-min 
resource sharing problem. Assuming the existence of a (c, t) - approximate block 
solver for t = 0{e) and any p G P, we are able to solve efficiently the problem 
(Rc,e)- Our main result is the following. 

Theorem 1. For any given c > 1 and e € (0, 1), there is an approximation 
algorithm for the general max-min resource sharing problem that computes a 
solution whose objective function value is at least j(l — e)A*, provided that there 
is a polynomial time block solver ABS{p,c,f) for t = 0(e) and any price vector 
p € P. The number of iterations or coordination steps can be bounded by N = 
0{M(lnM e“^lne“^)), where each coordination step requires a call to the 
approximate block solver and incurs an overhead o/0(M In ln(Me“^)) arithmetic 
operations. 

The approximation algorithm that solves the general max-min resource shar- 
ing problem works as follows. 

(1) compute initial solution s := 0, eo := 1/4; 

(2) repeat {scaling phase } 

(2.1) s := s -I- 1; Cs := es_i/2; x := finished := false; 

(2.2) compute threshold T(s); set A := {m G (1, . . . ,M}\fm{x) < T{s)}; 

(2.3) while not{finished) do begin 

(2.3.1) compute 9(f(x),A) a,nd p(f(x), A); 

(2.3.2) x:=ABS{p{f{x),A),c,es/6); 

(2.3.3) if stopping rule 1 or 2 is satisfied 
then begin finished := true; a;0) := x end 
else begin 

(2.3.3.1) compute step length r(s) and set x' := {1-t{s))x-\-t{s)x; 

(2. 3. 3.2) if rnaxmg^ fmix){l - r(s)) -h T{s)fm(x) > T{s) then re- 
duce 

r(s) to t' and set x' := (1 — t')x t'x; 

(2. 3. 3. 3) A:= A\ {m\fm{x’) > T{s)}; x := x'; 

end 

end 

(2.4) until Cg < e; 

(3) return(a;0)). 

To prove the main theorem we give the details of the approximation algorithm 
in the next subsections. In subsections 3.1 and 3.2 we describe how to compute 
an initial solution and how to define the stopping rules. Next in subsection 3.3 
we describe the computation in one scaling phase (and define the threshold 
values T{s) and the step lengths r(s)). In addition we describe the reduction of 
the step length from r(s) to r'. In subsection 4.4 we calculate the number of 
iterations. Finally in subsection 4.5 (using the choice of T(s) and r(s)) we show 
that the function values fm{x^^'^) of the final solution x^^'> are larger than the 
desired objective value (l/c)(l — £s)A* for any m G (1, ■ ■ • ,M} (including the 
eliminated indices m ^ A(x^®^)). 
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3.1 Initial Solution 

For the initial solution of the first phase, let where 

is the solution computed by ABS{em,c,t) for the unit vector with all zero 
coordinates except for its m-th component which is 1 and let t = 1/2. The next 
lemma provides a bound on A(/(x^°^)) = 

Lemma 4. For each p G P, X* < A{p) < 2cMp^ f{x^'^">). Furthermore, 

— 2 Mc^* each m = 1, . . . , M. 



3.2 Stopping Rules 



In our algorithm we use two stopping rules. The inequality in Lemma 3 implies 
that the maximizer 0{f{x), A) approximates the scalar product p(/(x), A)^/(a:). 
This fact will be used for the first stopping rule. For simplicity let p = p{f{x),A). 
For the first rule we define a parameter 



V = v{x, x) 



p^f{x) - p’^fjx) 
p'^f{x)+p'^f{x) 



( 3 ) 



similar to [9] that measures the distance between p^ f{x) and p^ f{x) where x G 
B is an approximate block solution produced by ABS{p{f{x), A),c,t). Notice 
that v{x,x) < 1. The next Lemma below states that x solves (Rc,e) when ly and 
t are both of order e. 



Lemma 5. Suppose e G (0,1) and t = e/6. For a given x G B, let p = 
p{f{x),A) G P as defined in (2) and x computed by ABS{p,c,t). If v{x,x) < t, 
then the vector x satisfies fm{x) > );(1 — e)A* for each m G A. 

In our algorithm we gradually increase the accuracy from initial 

solution to y(l — e) for the final solution . In the first phase we achieve 
an accuracy of at least ^(1 — ei) where Ci = 1/8. In the other phases we set 
€g = es-\j2 (for s > 2) and increase the accuracy to at least ^(1 — e^). The main 
goal in each phase s > 1 is to obtain a solution x G B with fm{x) > );(1 — Cs)A* 
for each m G {!,..., M}. If a phase stops with iy(x,x) < tg and tg = Cg/6 
then, by Lemma 5, fm{x) > ^(1 — £s)A* for each m G A. For the eliminated 
components m ^ Awe prove later that fm{x) is still large enough (see subsection 
4.5). 

The second stopping rule is used to bound the number of coordination steps 
during one phase. Here we use a parameter iVg defined by 



Ug 



2M 

1+ei 



for s = 1, 



1+2£s 

1+£s/2 



for 



s > 2. 



( 4 ) 



Let x^^ be the solution of the (s — l).th phase for s > 1 and x be an 
arbitrary solution in phase s. Then the two stopping rules are: 
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Rule 1 v{x, x) < ts, 

Rule 2 \{f{x),A)>u;sX{f{x(^-^^),M). 

The next Lemma shows what happens if the second stopping rule is satisfied. 

Lemma 6. Let be an initial and x be a final solution of phase s > 1 with 

X{f{x),A) > and ACM. // A(/(x(^-i)),7W) > ^A* for 

s = 1 and A(/(x('*“^)), Ad) > ^(1- es_i)A* for s>2, then fm{x) > ^(1 -es)A* 
for each m € A. 

Note that we can ensure the preconditions for X{f{x^^~^^), M) by Lemma 

10. 

3.3 Computation in One Phase 

In this section we describe how to go from a ^(1 — es_i) approximate solution 
to a -(1 — Cg) approximate solution. In each phase s the relative error tolerance 
€s is halved. The threshold value is defined as follows: 

Tt A = / A(/(xW), M)/(l + ei/48) for s = 1, 

^ ^ \ (6/eg)33 A(/(a:(^-i)), 7W)/(1 + Cg/48) for s > 2. 

Starting with the solution x = of the previous phase, we eliminate all 

components m with fm(x) > T{s) and set A = {m G {1, . . . , M}\fm{x) < 
T{s)}. In the algorithm we compute 9{f{x),A) and the corresponding price 
vector p = p{f{x),A). Based on the price vector p we compute a solution x 
via the approximate block solver ABS{p,c,es/()). Then we compute a linear 
combination of the old solution x and x, i.e. we set x' = {I — r(s))x + t(s)x for 
an appropriate step length r(s) G (0, 1). As step length we take 

^ 16M(p^f(x) +p'^f(x)) 

where tg = Cs/6 and 0 = 0(f(x),A). Since p^f{x) = (1 + ts\A\/{8M))0 > 0 
and p^f{x) > 0, we have 0/{p^f{x) +p'^f{x)) < 1 and r(s) < tg/(16M). Let 
7(0) = maxmg^ fjn{x){l — a) + afm{x) for 0 < a < 1. If 7(r(s)) < T{s) then we 
use x' as next iterate and set A' = {m G A\fm{x') < T{s)}. It may happen that 
fm{x') > T{s) and T{s) > fm{x){l — r) + Tfm{x) (since the functions /„ are 
concave). But if 7(r(s)) > T{s) then we reduce the step length r(s). In this case 
we compute t' such that t' < t and 7(r') = T{s). Notice that 7 (t(s)) > T(s) 
implies fm{x) > T{s) for at least one m G A (since fm{x) < T{s) for each 
m G A). Furthermore, the unique value t' can be computed in 0{M) time (and 
this running time can be neglected due to the overhead of the iteration) . In this 
case we use x' = x{1 — t')+t'x as new iterate and set A' = {m G A\fm{x') < T}. 

The next important step is to measure the increase or decrease in the poten- 
tial function. We suppose that v{x,x) > tg', otherwise phase s would stop with 
solution x. We consider two cases depending on whether we have eliminated a 
component or not. In the first case we use the original step length t(s) (otherwise 
we have A A'; a contradiction). 
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Lemma 7. For any two consecutive iterations in a phase with computed vectors 
x,x' and index sets A = A! , we obtain 

>tl{n{x,x) - ts/2)/{16M) >tl/{32M). 

For the proof of Lemma 7 we refer to the full version of the paper. The other 
case with A ^ A' \s more complex. Here we eliminate at least one component. 
We show that the potential value is not decreased. And actually this is important 
for the analysis of our algorithm. Note that this case occurs at most M —1 times 
during one phase. 

Lemma 8. For any two consecutive iterations in a phase with computed vectors 
x,x' and index sets A ^ A! , we obtain 

> 0 - 

For the proof of the Lemma 8 we refer also to the full version of the paper. 

3.4 Number of Iterations 

If our algorithm halts after all phases with a solution and index set 
then A(/(x^®^), > ^(1 — e)A*. Next we calculate the number Ng of coor- 

dination steps performed in a single scaling phase s > 1. This implies an upper 
bound for all scaling phases. In subsection 3.5 we consider the eliminated indices 
m and prove that fm{y) > j(l — e)A* for each m ^ A{y). This proves the Main 
Theorem 1. The proof of the following Lemma can be found in the full version. 

Lemma 9. The number of iterations Ng in phase s can be bounded as follows: 

( 32Mtf ^[{4+ {7 /8)ti)ln M] = 0{M In M) for s = 1, 

® “ 1^ 32MtJ^[(4 -I- (33/8))ts In for s >2. 

Summing over all scaling phases, the total number of coordination steps is 

liog(e-Ll 

N = 0{M {In M + In 

k^O 

The sum (2*)^ is bounded by 0{e~^). Therefore, the total number of 

iterations is 

0{M{lnM + e-‘^lne-^)). 



3.5 Eliminated Functions 

Since each function fm is concave and nonnegative, fm{x') > (1 — T)fm{x) + 
Tfm{x) > (1 — x)fra{x) for two consecutive solutions x and x' (and any step 
length r G (0, 1)). Since r < r(s) or equivalently 1 — r > 1 — r(s), the value 
fm{x') of any function is decreased by a multiplicative factor of at most (1— r(s)) . 
Using the choice of r(s) and the bounds for the numbers Ng of iterations during 
a phase s, we can prove the following result: 
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Lemma 10. Let be the final solution of phase s. If for 

s = 1 and A(/(x^^“^^)) > ^(1 — es_i)A* for s >2, then 

fm{x^^^)>-a-es)X* 

c 

for each m € {1, . . . , M}. 

Remark: The root 9{f{x),A{x)) can often be computed only approximately, 
but an accuracy of 0(e^ jM') for 9{f{x),A{x)) is sufficient to generate the above 
bounds on the number of iterations. 

4 Conclusion 

In this paper we have proposed an approximation algorithm for the general max- 
min resource sharing problem that uses only 0(M(ln ln(e“^))) iterations 

or calls to the block problem. The new algorithm is faster and simpler than 
the previous known approximation algorithms. Many combinatorial optimization 
problems (like the ones mentioned in the introduction) can be modelled as max- 
min resource sharing problems with an exponential number N of variables and a 
polynomial number M of constraints. Our approximation algorithm can be used 
to solve these optimization problems efficiently. The number of iterations of 
the approximation algorithm (or calls to the approximate block solver) depends 
polynomially only on M and 1/e. This is of order 0{M In M) if we neglect the 
dependence on e. Notice that Grigoriadis et al. [9] showed a lower bound of 
fi{M) for the instance f{x) = x and B = P: no Lagrangian decomposition 
scheme for this instance can bring in more than one vertex of the simplex B per 
iteration. On the other hand, all of the M vertices of B are needed to obtain 
an approximate solution x with \{f{x)) > 0. An interesting open question is to 
find a lower bound that depends on 1/e. 
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1 Introduction 

Maximizing crane efficiency of shipping loading and unloading plays an im- 
portant role in order for the modern sea transportation system to increase its 
port throughput with respect to pressures derived from the limited port size, 
high cargo transhipment volumes and limited physical facilities and equipments 
[9]. This can explain the concerns with the crane scheduling and management 
aroused from the Port of Singapore Authority (PSA) [1] and other world busiest 
ports (like Hong Kong [12] and Australia [6]). 

Different from general machine scheduling problems [2], the crane scheduling 
problem we consider focuses on cargo processed only from ships where each 
crane is assumed to complete a job it is allocated. Jobs cannot be shared 
by cranes and are not pre-emptable. We take the set of n jobs to be given 
by •/ = {l 5 2,...,n} and m cranes to be given by I = {l,2,...,m}, with the 
usual ordering, and assume jobs and cranes are located along two parallels. 
For jobs a,b G J, a < b is equivalent to a precedes 6 or a is to the left of b. 
Likewise for cranes. Because positioning cranes takes relatively little time [9], 
the processing time, Pj € for each job j G J, is given only by the time a 
crane takes to complete the job, assumed to be the same for all cranes. We seek 
a scheduling scheme which includes a starting-time allocation map s : J — >■ 
and a job-to-crane allocation map cr : J — >■ J. For each j G J, the processing 
time on crane aj is given by [sj, Sj +Pj)- Because of the non-crossing constraint 
for cranes which move on on the track, a scheme is feasible if and only if for 
any k, j G J, where k < j, either k and j are processed separately in time, i.e., 
[sfc, Sfc J- Ffc) n [sj, Sj + Pj) = $, or k and j are processed on cranes which do not 
cross, i.e., ak < (Jj. The objective of the crane scheduling problem is to find a 
feasible schedule, consisting of s and a, which minimizes the latest completion 
time, i.e., which minimizes max^gj Sj + Pj. A constraint programming model of 
the problem is given as follows. 
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minimize max^ g j Sj + Pj 

s.t. for all k,j £ J and k < j, 

(Uj < (Jk) -£• [(Sj + Pj < Sk) V (Sfc + -Pfc ^ Sj)] 
where for all j £ J, Uj £ I and Sj £ 



Figure 1 illustrates an example of the non-crossing constraint with two cranes 
and five jobs locating on two parallels. Because of the non-crossing constraint, 
jobs 4 and 2 cannot be assigned to cranes 1 and 2 simultaneously, but jobs 1 
and 5 can. 



hi- 



2 



jobs -(> O-t) -(> 

1 2 3 4 5 



Fig. 1. An instance with two cranes and five jobs. 



Existing models in the literature, prior to the model introduced by [7], did 
not incorporate spatial constraints for the crane problem. It has now become 
common for cranes to move on single tracks parallel to the length of ships. Be- 
cause of this, cranes would have to cross each other in attempting to reach jobs 
located in different areas on ships. [7] studied a model that took into account of 
this impossibility as ’’non-crossing” constraint and used a tabu search heuristic 
for solutions. Recently, [13] devised a branch and bound search scheme and a 
simulated annealing algorithm for the model introduced in [7] . 

Related literature which focus on crane scheduling without spatial constraints 
can be found in [3] and [9]. [3] studied the static problem where cranes were al- 
lowed to move freely from hold to hold and only one crane was allowed to work 
on a hold at any one time. In [9], cranes performed at constant rates and can 
interrupt work. This constituted a parallel and identical machine problem where 
jobs consist of independent, single-stage and pre-emptable tasks. Other studies 
on port operations involving cranes can be found in [1], [12] and [6]. Studies 
involving cranes in the manufacturing environment can be found in [8] and [5]. 

The crane scheduling problem is easily solved when only one crane is con- 
sidered, but is intractable even when the number of cranes is fixed but greater 
than one. This is revealed by the following Theorem 1, which can be proved by 
a reduction, similar to the one in [13], from the Partition problem [4] that is a 
well known AfP-hard problem. 

Theorem 1. Given a fixed number of cranes m>2, the crane scheduling prob- 
lem is MV-hard. 

Although various models were studied for the crane scheduling problem in 
the literature, no approximation algorithm has been proposed. The main contri- 
butions of this paper are: 
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1. When the number of cranes is fixed, we provide a dynamic programming 
algorithm to solve the problem optimally in a pseudo-polynomial time, and 
provide a fully polynomial time approximation scheme with a polynomial 
running time in both the size of the problem and the required quality of 
solution. 

2. When the number of cranes is arbitrarily large, we provide three polyno- 
mial algorithms that guarantee approximation factors of 2, and that exhibit 
good performance in experiments, comparable to the best meta-heuristics 
proposed in [13]. 

The rest of this paper is organized as follows. In Section 2, we develope 
a dynamic programming algorithm to solve the problem exactly. In Section 3, 
we derive a fully polynomial time approximation scheme in for the case when 
the number of cranes is fixed. Extending this, three 2-approximation algorithms 
are illustrated in Section 4. To show their practical performance, experimental 
results are also presented. 

2 A Dynamic Programming Algorithm 

The dynamic programming algorithm proposed here requires pseudo-polynomial 
time when the number of cranes m is fixed. The idea behind it is follows. 

Firstly, we show that when the crane-allocation map a is given, an optimum 
time-allocation map s can be obtained to efficiently minimize the latest comple- 
tion time as follows. Noting that constraints on s depend on ct, we can decompose 
decisions for cr and for s, so that once a is given, the model will depend only on 
s and becomes: 

minimize maxjg j Sj + Pj 

s.t. for all k,j € J with k < j and Uj < ak, 

(^Sj Pj ^ V (^Sj^ P^ ^ Sj ) 
where for all j G J, Sj G 

The above conditions on s can be replaced by Sk + Pk "£ Sj without delaying 
the minimum latest completion time. To show this, we prove the following The- 
orem 2 which describes an implicit form of an optimum time-allocation map s 
for a given crane-allocation map a. 

Theorem 2. Given a crane-allocation a, an optimum time- allocation s which 
minimizes the latest completion time (i.e. max^gjSj -I- Pj) can he obtained as 
follows: 

{ 0 when j = I 

max Sk -\- Pk when j = 2, (^) 

Wk<j, <Tk>(Tj 

Proof. Let s* denote one of those time-allocation maps that minimize the latest 
completion time for the given a. We are going to prove that s* is not better than 
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s to imply s to be an optimum time-allocation map as well. 

For any time-allocation map 9, let Tg{p) denote the latest completion time 
of jobs processed on cranes p,p+ 1, ...,m under 9. Because ^^(l) is exactly the 
latest completion time of all jobs under 9, to show that Sj minimizes the latest 
completion time for the given crane allocation map Uj, it is sufficient to show 
the following lemma. 

Lemma 1. For every 1 < p < m, we have Ts{p) < Ts*{p) where s and s* are 
as in the proof above. 

To prove Lemma 1, let us define a gap or a non-gap for a crane to be the 
time slot in which the crane is idle or busy, respectively. Thus, under a time- 
allocation map 9, let gg{p,w) indicate whether -I- 1) is a gap or a non-gap 
for the crane, for 1 < p < m and 0 < w < Tg{p) — 1. Hence, gg{p,w) 
is 1 if no job is processed on p during [■u;,w -|- 1), and 0 otherwise. Further 
to this, let lg{p,w) denote the leftmost job processed on cranes p,p-l- l,...,m 
during -l- 1). If no such a job exists, set lg{p,w) = -l-oo. Accordingly, if 

gg{p, w) = 0, then lg{p, w) = j < lg{p 1, w) where j is the only job in Jp with 
w € [sj,Sj -I- Pj). Otherwise gg{p,w) = 1, we have lg(j>,w) = -l-oo if m = p, 
and lg{p,w) = lg{p l,rc) otherwise. Therefore, lg{p,w) < lg{p l,w) for 
1 < p < m. Besides this, it is easy to verify that the s given by (1) must lead to 
ls{p,0) < ls{p, 1) < ••• < ls{p,Ts{p) - 1) for 1 < p < m. 

We now construct a series of injections {fp} recursively for p from m to 1, 
where fp is an injection from {0, ...,Ts{p) — 1} to {0, ...,Ts»{p) — 1} and satisfies 
h* (P) fp{w)) < ls{p, w) for 0 < w < Tg{p) — 1. Because fp is an injection, this is 
sufficient to show Tg{p) < Tg»{p). 

In the following arguments, let Jp = {j\(Tj = p,j G J} denote the set of jobs 
processed by the crane p, and Upp < Up ^2 < ■■■ < cip.up represent the total Up 
jobs in Jp, ordered by their locations from left to right. 

To construct {fp}, we take p = m first. Because under s, jobs are processed 
consecutively on m, no gaps exist. We define fm{sa -I- Zi) = -I- Zi for 0 < Z\ < 

Pa — 1 and every a € Jm- Because ls*{rn, fm{sa Z\)) = a = ls(jn,Sa Z\) 
and no two jobs are simultaneously processed on m under either s or s*, fm 
must be an injection from {0, ...,Tg{m) — 1} to {0, ...,Ts*{m) — 1}, and satisfy 
fjn{w)) < ls{m,w) for 0 < w < Ts{m) - 1. 

Now, assume that an injection /p+i from {0, ..., Ts(p-l-l) — 1} to {0, ..., Tg. (p-|- 
1) — 1} has been obtained and satisfies l.s*{p I, fp+i{w)) < ls{p l,w) for 
0 < w < Tg{p -|- 1) — 1, where 1 < p < m — 1. We can define an injection fp on 
non-gaps and gaps for the crane p under s as follows. 

On the one hand, under the time-allocation mapping s, time slots which are 
non-gaps for crane p must be covered exactly by processing slots [sa,Sa -I- Pa) 
for all a G Jp. Let fp{s{a) -I- Z\) = s* -I- Z\ for 0 < Z\ < — 1 and every a G Jp. 

By similar arguments made for the case of p = m, we also have Ig* (p, fp{w)) < 
lg{p, w) when [w, ru-l-l) is a non-gap. Moreover, it is easy to see [fp{w), fp{w)-\-l) 
is a non-gap on p under s*, which implies gs*{p, fp{w)) = 0. 

On the other hand, consider every gap [w,r(; -I- 1) for p under s. We have 
lg{p-\-l,w) = lg{p, w). Because of (1), jobs on the left (or right) of ?s(p-|- 1, w) in 
Jp must be processed by p before (or after resp.) w. So 9s{p, Z\) = re -I- 1 — 
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J2ae.jp,a<i,(p+I,w) Pa- As shown in Figure 2, for all 0 < Z\ < w, /p+i(A)) < 
ls*{p+ l,/p+i(A)) < ls(p+ I, A) < ls(p+ l,w) = ls{p,w). Because /p+i is 
an injection, we have J2a=o 9s-{p, fp+i{A)) > w + 1 - J2aeJp,a<Up+i,w) Pa = 
Y1'a=o9s{Pi A). Accordingly, let x be the threshold that is the smallest non- 
negative value of w' which satisfies X)zi=o 5«* (P> /p-ti(A)) = ’^^=q 9 s{p,A). 
Define fp{w) = fp+i{x). Noting 0 < a; < w, we can obtain ls*{p, fp{w)) = 
h*{P: fp+i{x)) < ls{p,w). Moreover, gs*{Pi fp{w)) must be 1 because x is the 
threshold and gs{p,w) = 1- 



h(p,w) 



/s(p+l,A) \ /vlp+l.iv) 



/,*(P,y;,+i(A)) 



y 

'?,.(P+1,4+i(A)) 



Fig. 2. Illustration of the proof of Lemma 1 



We have shown that ls*{p, fp{w)) < ls{p,w) for 0 < w < Tg{p) — 1. Now we 
can prove that fp is an injection from {0, ...,Ts{p) — 1} to {0, ..., (p) — 1}. 
Consider any two different wi and W 2 in {0,...,Tg(p) — 1}. Three cases occur. 
Firstly, if both [wi, tci-|-l) and [w 2 , W 2 -I-I) are non-gaps, it is easy to see fp{wi) yf 
fp{w 2 )- Secondly, if exactly one of them is a gap, let us assume [tci,!/;! -I- 1) is 
a gap but [W 2 ,W 2 + 1) is not. Since gs*{p, fp{wi)) = 1 and gs*{pjp{w 2 )) = 0, 
we know fp{wi) yf fp{w 2 )- Lastly, if both of them are gaps, assume X\ and X 2 
are their thresholds so that fp{wi) = /p+i(a;i) and fp{w 2 ) = fp+i{x 2 )- Because 
gs{p,wi) = 1 and gs{p,W 2 ) = 1, we have 9s{p, A) y^ J2a=o9s{p,A), 

implying xi yf X 2 - Since /p+i is an injection, we know fp{wi) yf fp{w 2 )- 

We have thus obtained injections fp from {0, ..., T 5 (p) — 1} to {0, ..., Ts*{p) — l} 
recursively, satisfying ls*{Pi fp{w)) < ls{p,w) for 0 < w < Tg{p) — 1, for p from 
1 to m. Because these are injections, we can conclude that Ts{p) < Tg»{p) for 
1 < p < m, which completes the proof of Lemma 1 and leads the correctness of 
Theorem 2. □ 

Theorem 2 provides an optimum scheduling policy for a given crane- 
allocation map. Let every crane process jobs it is assigned in order, from left 
to right. If two cranes need to move in opposite directions (cross), the left crane 
will not start its next job until the right crane completes its assigned job. This 
simple policy guarantees the minimum latest completion time without cranes 
crossing. By employing an array C[i] to store the latest completion time for each 
crane i after processing jobs 1 to j where j = 1,2, ...,n, we obtain the follow- 
ing Algorithm 1 to compute an optimum time-allocation map (1) for any given 
crane-allocation map in 0{mn) time. 

By Theorem 2, the problem of finding a crane-allocation map a can be de- 
scribed concisely by: 
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Algorithm 1 (A time allocation algorithm for a given crane-allocation map cr) 

1: C[i] 0, for 1 < i < m; 

2: Si 0; 

3: for j = 2 to n do 

4: Sj max C[i]; 

VOcTj 

5 - C[aj] Sj + Pj; 

6: end for 

7: Return s; 

minimize maxjg j Sj + Pj 
s.t. (1) 

where for all j S J, aj S I 

This model implies that only an optimum crane allocation needs to be de- 
termined to minimize latest completion times. Fortunately, an optimum crane 
allocation can be determined by the following dynamic programming algorithm. 

Let J{q) = {1,2,..., q} indicate the set of the q leftmost jobs. Let 
A{q,Wm,Wm-i, ■■■,W 2 ) denote the minimum latest completion time of jobs in 
J{q) processed on crane 1, where the latest completion time of jobs in J{q) pro- 
cessed on other cranes m,m — 1, ..., 2 are exactly Wm, Wm-i, ■■■, W2, respectively. 
The value of A{q,Wm,Wm-i, ■■■,W 2 ) can be computed recursively by the follow- 
ing dynamic programming algorithm. 

Initially, A{0,Wm,Wm-i, ■■■,W 2 ) is set zero if Wm = Wm-i = = W 2 = 0, 

and positive infinity, otherwise. 

When q > 0, consider job q. Assume q is processed by the crane p whose 
previous job is q' with completion time Wp. Here q must be started exactly 
after a certain job r that is the latest completed job on p,p + l,...,m in 
J{q — 1). If p = I, it is easy to see that q must be started at max(A(g — 
l,Wm,Wm-i, ■■■,W 2 ),Wm,Wra-i, ■■■,W 2 ). Otherwise, assume p > 2. If r is pro- 
cessed on p as shown in Figure 3(a), then r = q' and its completion time 
Wp — Pq must be larger than m.ax{wm,Wm-i, ■■■,Wp+i). In this case, we have 
A{q,w^,Wm-i, -,Wp, ...,W 2 ) = A{q - l,Wm,Wm-i, -,Wp - Pq , ..., ^ 2 )- Other- 
wise, as shown in Figure 3(b), r is completed at m.a,x{wm, Wm-i, ■■■, 'Ji'p-i-i) which 
equals Wp — Pq and is larger than Wp. In this case, p must be less than m, 
and A{q, Wm, w„_i, ..., Wp , ..., W 2 ) equals to maxo<^„^<max(«,„,,j«„_i,....™p+i) Mq- 
l,Wm,Wm-l, —,w'p, ...,W2). 

We have thus obtained: 

A{q,Wm,Wra-l, ■■■, W 2 ) = 

' for all 2 < p < m where m.ax{wra, Wm-i, ■■■, Wp+i} < Wp — Pq, 

A{q - l,Wjn,W^_i, ...,Wp - Pq, ...,W2)-, 
for all 2 < p < m - 1 where ma,^{wm, Wm-i, ■■■, iCp+i} = Wp - Pq, 
max A{q - l,Wm,Wm-i, ■■■,w ' , ..., W 2 ); 

0<lOp <max{tOTn - 1 , • • • } 

max(A((7 - 1, W„_l, ..., W 2 ), Wm, Wm-l, ■■,W2) + Pq. 



( 2 ) 
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(a) r is processed on/; (a) r is not processed on/; 

Fig. 3. Recursive computation of A{q, Wm, Wm — 1, •••, W 2 ) 

It follows that the minimum latest completion time is the smallest value of 
U\a^{A{n,Wra,Wm-l, ...,W 2 ),Wm,Wm-l, ■■■, W 2 ) for all 1 < Wm,Wm-l, -,W 2 < T, 
where T is the smallest total processing time required to process all jobs on 
the same crane. It is now easily verified that the time complexity to com- 
pute A{q,Wm,Wm-iT--,W 2 ) recursively is + (m — 2)T)) = 

0{nmT'^~^ + n{m — 2)T™) and we have obtained a pseudo-polynomial algo- 
rithm for the crane scheduling problem where m, the number of cranes, is fixed. 
Note that when m equals to 2 (or 3), the time complexity becomes 0{nT) (or 
0{nT^) resp.), which is reasonable when T is not large. 

3 A Fully Polynomial Time Approximation Scheme 

Based on the dynamic programming algorithm shown in Section 2, a fully poly- 
nomial time algorithm (FPTAS) can be derived as follows, for instances with a 
fixed number of cranes. 

Recall that the recurrence equation (2) implies a pseudo-polynomial algo- 
rithm. Thus, we can apply scaling and rounding [11] to obtain a FPTAS scheme, 
similar to that way a FPTAS is found for the multiple machine scheduling prob- 
lems [10]. The basic idea here is to scale and round the processing time of every 
job to be bounded by a polynomial in n and 1/e for any e > 0. For any new 
instance, an optimal schedule can be obtained by the given dynamic program- 
ming algorithm in polynomial time. From this schedule, a solution to the original 
instance can be obtained which achieves a latest completion time at most (1 + e) 
times the optimum. 

Formally speaking, let OPT, a and s represent the latest completion time, 
the crane-allocation map and the time-allocation map of the optimal schedule, 
respectively, for a given instance X . By Theorem 2, s can be obtained from a 
by Algorithm 1 in polynomial time. We now scale and round X to become a 
new instance X' , where OPT' , a' and s' {a') denote the latest completion time, 
the crane allocation map, and the time-allocation map of the optimal schedule, 
respectively, of X'. Let T^ax denote the maximum value of Pj in X, for j € J 
and let K = eTmax/n. The new instance X' is then generated by Pj = \_Pj/K\, 
which is at most [n/ej bounded by a polynomial in n and 1/e. An optimum a' 
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for X' can then be computed in polynomial time by the dynamic programming 
algorithm of (2). Given the same crane allocation map a' for the original instance 
AT, we can obtain a time-allocation map, denoted by s(a'), for X by Algorithm 1, 
in polynomial time. This process is formulated in Algorithm 2, which achieves 
at most (1 -|- e) times the minimum latest completion time in view of Lemma 2 
as follows. 

Algorithm 2 (A fully polynomial time scheme) 

1: Given e and an instance A, let A = LtkaiiL. 

2: For each job j G J, define Pj = Y~^\ to obtain the new instance X' . 

3: Use the dynamic programming algorithm to find an optimum crane allocation map 
o' which achieves the minimum latest completion time for A'; 

4: Adopting the same crane allocation map o' for A, use Algorithm 1 to obtain the 
time-allocation mapping s(ct') for A; 

5: Return o' and s{o') as the schednle for A. 



Lemma 2. The schedule with a' and s from Algorithm 2 satisfies Sj + Pj < 
(1 -I- e)OPTx for every j G J. 

Proof. We first prove OPT' < OPT / K. Using ct, a time allocation map denoted 
by s' {a) can be found for X' by Algorithm 1. Let Jp = {opp, ap, 2 , dp,np} indi- 
cate the set of Up jobs processed on p under ox, where < ap^ 2 --- < o,p,up for 
p G I. Now, we show s'{a)apj < s{a)ap jK for p G / and j G Jp by induction. 

When p = m and j = 1, obviously = 0 < s{a)a^^i/K. Then, as- 

sume s'{a)a^^ < s(o)ap pjP for all t and q where p -I- 1 < r < m, or, r = p 
and 1 < 9 < j — 1. Now let us consider s'{a')ap^y When j = 1, we have 
s'('^)ap,i = max(0,maxa,._,<ap,i.p+i<r<m -I- Pa,..,)- By the assumption 

and since P'ap ^ < Par,,/^> we know s'(CT)ap.i < ^(a-jap.i. Otherwise j > 1, we 
have s'(cr)ap.^ = max(s'(cr)ap,,._i-bP'p^_,,maXap p<ap,^,p+l<r<mS'(cr)ap.,-bPap_.^). 
By the assumption, together with Pp.ap ,_i ^ Pop.,_i/A and P^^ ^ < Pa^ ^/K, we 
have s'(tr)op., < s(ct)op,,-. This completes the proof that s'(CT)ap.,- < s{a)apj/K 
for p G / and j G Jp. Hence, supposing that the latest completed job under s(cr) 
is a and is processed on the crane f3, we can now obtain OPT' = s' (a) a + P4 < 
s{a)a/K + Pp,/K < OPT/K. 

Next we show s{a')j + Pj < (1 -b l/e)OPT for all j G J. Note that 
the time allocation map s(tr') for X is generated from a' by Algorithm 1. 
Let Jp = {a'p i,a'p 2 , be the set of n'p jobs processed on p under 

a', where a'p i < Op 2 --- < '^pn' fo>^ p G I. Now, we prove that s{a)a'^^ < 
Ks'{a')a'^ . + p{p,j)K, where p{p,j) = J2T=p+iK + U ~ !)• by the following 
induction. When p = m and j = 1, s{cj')a'^ , = 0 < Ks' + p(m, l)K 
because p(m, 1) = 0. Then, assume s{a')a'^^ < Ks'{a')a'^^ + p{r,q)K for all 
r and q where p + 1 < r < m, or, r = p and 1 < <7 < J — 1. Let us con- 
sider s{a')a'^ .. When j = 1, s(cr')a; , = max(0,maxa; .^<a; ,,p+i<r<m + 

Pa'^ ). By the assumption and Pa>^ < Pfi K + K, we know s(tr')a' < 
Ks'{a')a'^^+p{p,0)K+K < Ks'{a')a'^^+p{p, l)K. Otherwise j > 1, s{a')a'^_. = 
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max(s((T')a;^_j+-Pa;^_j,niaXa/,_^<a^^_p+i<r<^ s(o'')o;,,+-Pa; ,,)- By the assump- 
tion, together with P„i < P' , K + K and Pn' < P', K + K,we can see 
s(o-')a; ^. < Ks'{a')a'^ .+^i{p,j-l)K + K < Ks' {a')a’^ ._^ + p{p, j)K . This com- 
pletes the proof that s{(j')a'^ . < Ks'{a')a'^ . + p{p,j)K for p £ I and j G J^. 

Hence, for every j £ J, let p = cr' so that j G J'. Because T^ax < OPT, 
we have s{a')j + Pj < Ks'{a')j + p{p,j)K + Ktf + K < KOPT' + nK < 
OPT + eTmax < (1 + e)OPT, which completes the proof. □ 

By Lemma 2, the latest completion time of the schedule by Algorithm 2 is 
at most (1 — e)OPT. Since the total processing time on a single crane is at most 
n^/e, the running time of Algorithm 2 can be estimated to be 0(nm(n^/e)’”“^ + 
n{m — 2)(n^/e)'"), which is bounded by a polynomial in n and 1/e when m is 
fixed. This establishes the following theorem. 

Theorem 3. Algorithm 2 is a fully polynomial-time algorithm for the crane 
scheduling problem with a fixed crane number m. 

4 2- Approximation Algorithms 

Although Algorithm 2 has a polynomial time complexity regardless of job pro- 
cessing times, it consumes exponentially large times when the number of crane 
is arbitrary. To address this shortcoming, the following three 2-approximation 
algorithms, denoted APPXi, APPX 2 and APPX 3 , whose time complexity is 
polynomial for any instance have been developed. The basic idea behind these 
algorithms is to assign jobs to cranes in an average way as far as possible while 
ensuring cranes have as little idle time as possible. 

APPX^. Let OPT denote the minimum completion time and let AVG = 
Pj)/m denote the average processing time of the n jobs on m cranes. It 
is easy to see that OPT > AVG. Assign crane 1 to process job l,job 2, ..., job 
ji sequentially where Pj P AVG > Pj- Similarly, assign crane i to 

process job j^-l -£ 1, job ji-i -£ 2, ..., job A sequentially, where Pj > 

AVG > 1 Pj 2 < z < m — 1. The remaining (n — jm-i) .jobs, from 

jm-i -I- 1 to n, are processed by the last crane m. Clearly, each crane can process 
jobs continuously and independently and will not cross over each other because 
jobs processed by crane i are all on the right of jobs processed by cranes to the 
right of i for all i £ I. This approach is implemented in Algorithm 3 which has 
a time complexity of 0(n). The following theorem verifies the approximation 
factor. 

Theorem 4. APPX\ < 20PT. 

Proof. On the one hand, for each crane z < m — 1, the completion time is at 
most as large as AV G Pj ^ , which is not more than 20 PT, because AV G < 
OPT and Pj^ < OPT. On the other hand, for the last crane to, the completion 
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Algorithm 3 APPXi. 

1: Let 

2: Let sum 0 and i 0; 

3: for j = 1 to n do 
4: (j(j) <— i and s(j) <— sum; 

5: sum <— s + Pj\ 

6: if sum > AVG then 

7: sum <— 0 and i i + 1; 

8: end if 

9: end for 
10: Return a and s; 

time is at most as large as AVG, which is not more than OPT. This proves 
APPXi < 20PT. □ 

As shown in Figure 4, the factor of 2 is tight for APPXi with respect to 
the instance which has m cranes and 2m jobs, where the leftmost m jobs need 
to be processed in L — 1 time slots each and the rightmost m jobs need to be 
processed in 1 time slots each. The best schedule will assign each crane two 
jobs, one from the leftmost m jobs and the other from the rightmost m jobs 
with a minimum latest completion of L, exactly. APPXi will, however, as- 
sign two (L— l)-length jobs to the first crane, with a latest completion of 2(L— 1) . 



APPX 2 - To improve on APPXi, a dynamic programming algorithm can be 
used. In this scheme, we continue to assign adjacent jobs, ji-i + 1, ji-i + 2, 
to crane i for 1 < i < m, but need to decide the best partition points, 
ji, j 2 , ■■■, jm-i, which minimizes the latest completion time. Let A[i,j] denote 
the minimum latest completion time when jobs l,2,...,j are partitioned in an 
adjacent manner among cranes l,2,...,i. By enumerating all possible partition 
points for ji, we obtain the following dynamic programming equations (APPX 2 ) ■ 

j] = T[l,j], for 1 < j < n (3) 

A[i,j] = min max{A[z - l,ji],T[ji + 1, j]}, for 2 < z < m, 1 < j < n (4) 

where T[ji, J 2 ] = denotes the total processing time of jobs ji, ji + 

1, ..., j 2 . The time complexity here is 0(jnn^). Since APPX 2 optimizes partition 
points, ji, J 2 j ■■■Om-i, the latest completion time given must be no larger than 
that given by APPXi. Hence, we have 



L-l 



i-1 1 1 



m jobs 



m jobs 



Fig. 4. A tight instance for APPXI and APPX2 
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Theorem 5 . APPX2 < APPXi < 20PT. 

Let us consider the worse case shown in Figure 4 again. If we fix L = m. 
APPX2 will return a schedule with time 2m — 2, still nearly two times of the 
optimum. However, \i L » m, the latest completion time produced by APPX2 
becomes L + m that is much closer to the optimum L than that by APPXi. 



APPX 3 . If we analyze the instance shown in Figure 4, we find that optimum 
schedule found divides the 2m jobs into two parts, where the leftmost m jobs 
are processed by the m cranes first, following which the rightmost m jobs are 
then processed. This suggests a new dynamic programming APPX 3 which is an 
extension of APPX 2 where the division of a job sequence is allowed. Suppose 
we are to schedule a job sequence, S = {a, a + !,...,& — 1,6}, for c cranes to 
process. Let k denote the division point of two job subsequences given by Si = 
{a, 0 + 1, ..., k} and S 2 = {k+l,k + 2 , ..., b}. There are then two ways and S 2 
can be processed. One is to process jobs in S 2 after all jobs in Si are completed, 
and the other is to process jobs in by the leftmost p cranes and to process 
jobs in S2 by the rightmost c — p cranes, independently. It is easy to verify that 
both are feasible. 

Let B[c,a,b] denote the minimum latest completion time found by dividing 
the job sequence {o, o + 1, ..., 6 — 1, 6} among c cranes. By the above arguments, 
we have the following dynamic programming equations {APPX3), for 1 < c < 
m, 1 < a < 6 < n. 

B[l,a,h]=T[a,b] (5) 

min B[c,a,k] + B[c,k + l,b], 

aKk<b 

min max{i?[p, a, fc], i?[c — p, /c + 1, 6]} 

l<p<c,a<fc<6 

where T[a, b] = Y^j=a denotes the total processing time of jobs a, a + 1, ..., 6. 
The time complexity here is 0{mS'nP‘). Finally, because APPX3 improves 
APPX2, we now have the following theorem. 

Theorem 6 . APPX3 < APPX2 < APPXi < 20PT 




, if c > 2 (6) 



Experimental results. To examine the practical performance of the three 
2-approximation algorithms, the following experiments are conducted in a Pen- 
tium IV 2.40GHZ machine with programs coded in C++. Since real instances 
are hard to obtain and control, we adopt the four groups of test instances, with 
m = 5, 10, 15 and 20, which are randomly generated in [13]. 

Let APPXi, APPX2, and APPX3 denote the three 2-approximation algo- 
rithms, and SA to denote the simulated annealing heuristic proposed in [13]. 
Figure 5 summarizes the difference of their performance, in terms of the average 
gaps from the lower bounds published in [13], over instances with different num- 
bers of cranes. As we expect, the APPX3 are superior to APPX2 and APPXi. 




334 



A. Lim, B. Rodrigues, and Z. Xu 




- APPX1 

- ■ - APPX2 
— A — APPX3 

• --SA 



The number of cranes (m ) 



Fig. 5. Comparison of performance among approximations and the simulated annealing 



In addition, APPX^ exhibited stability since gaps had a small variance, from 
3.16% to 3.78%, as the number of cranes increased. When m increases to 15, 
APPX^ exhibits even superior to the the SA heuristic which has the best per- 
formance among methods proposed in [13]. In light of the fact that only an 
approximation factor of 2 was proved for APPX^, its performance far exceeds 
this approximation. Moreover, schedules provided by APPX^ exhibited a regu- 
lar form where little crane movement was required - a feature which is useful in 
applications. 
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Abstract. Consider a given undirected graph G = {V,E) with non- 
negative edge costs, a root node r € V, and a set D C V of demands 
with dv representing the units of flow that demand v G D wishes to send 
to the root. We are also given K types of cables, each with a specihed 
capacity and cost per unit length. The single-sink buy-at-bulk (SSBB) 
problem asks for a low-cost installation of cables along the edges of G, 
such that the demands can simultaneously send their flows to sink/root r. 
The problem is studied with and without the restriction that the flow 
from a node must follow a single path to the sink (indivisibility con- 
straint). We are allowed to install zero or more copies of a cable type on 
each edge. The SSBB problem is NP-hard. In this paper, we present a 
145.6-approximation for the SSBB problem improving the previous best 
ratio of 216. For the divisible SSBB (DSSBB) problem, we improve the 
previous best ratio of 72.8 to ok, where ax is less than 65.49 for all K. 
In particular, 02 < 12.7,03 < 18.2,04 < 23.8,05 < 29.3, oe < 33.9. 



1 Introduction 

Consider a given undirected graph G = {V, E) with non-negative edge costs, a 
root node r € V, and a set II C U of demands with d„ representing the units of 
flow that demand v G D wishes to send to the root. We are also given K types 
of cables, each with a specified capacity and a cost per unit length. The cost 
per unit capacity per unit length of a high-capacity cable is typically less than 
that of a low-capacity cable, reflecting “economy of scale”. In other words, it is 
cheaper to buy a cable of larger capacity than many cables (adding up to same 
capacity) of smaller capacity. The extensively studied single-sink buy-at-bulk 
(SSBB) problem, also known as the single sink edge Installation problem, asks 
for a low-cost installation of cables along the edges of G, such that the demands 
can simultaneously send their flows to sink/root r, under the restriction that the 
flow from a node must follow a single path to the sink (indivisibility constraint) . 
We are allowed to install zero or more copies of a cable type on each edge. By 
divisible SSBB (DSSBB) problem, we refer to the version of the SSBB problem 
without the indivisibility constraint. 

The SSBB problem has applications in the hierarchical design of telecommu- 
nication networks, in which the traffic from a source must follow a single path 
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to the sink. The DSSBB problem has its own applications: a classic application 
would be that of routing oil from several oil wells to a major refinery [8]. 

The buy-at-bulk network design problem was introduced by Salman, 
Cheriyan, Ravi and Subramanian [8]. They showed that the problem is NP- 
hard, by showing a simple reduction from the Steiner tree problem or the knap- 
sack problem. The problem remains NP-hard even when only one cable type 
is available. They also presented a 0(logn)-approximation algorithm for the 
SSBB problem in Euclidean graphs. For problem instances in general metric 
spaces, Awerbuch and Azar [1] presented a 0(log^ n)-approximation algorithm. 
Their algorithm works even for multi-root version of the problem. Bartal’s tree 
embeddings [2] can be used to improve their ratio to O (log n log log n). Garg 
et al. [3] presented an 0(AT)-approximation algorithm based on LP-rounding. 
Guha, Meyerson and Munagala [4] presented the first constant-factor approxi- 
mation algorithm, whose ratio was estimated to be around 2000 by Talwar [9]. 
In the same paper, Talwar presented an LP-based rounding algorithm with an 
improved ratio of 216. 

Recently, Gupta, Kumar and Roughgarden [5] presented a simple and elegant 
72.8-approximation algorithm for the SSBB problem. But unfortunately, their 
approach does not guarantee that the flow from a node follow a single path to 
the sink. In other words, their ratio of 72.8 holds for the DSSBB problem, but 
not for the SSBB problem. That leaves Talwar’s ratio of 216 as the current best 
for the SSBB problem. 

In this paper, we design a 145.6-approximation algorithm for the SSBB prob- 
lem, using ideas from Gupta, Kumar and Roughgarden [5], but guaranteeing the 
indivisibility constraint is not straightforward. We introduce a new “redistribu- 
tion” procedure which is pivotal in guaranteeing that the flow from a source 
follows a single path to the sink. We also propose a modification to their DSSBB 
algorithm that reduces the ratio from 72.8 to uk, where uk is less than 65.49 
for all K. In particular, a 2 < 12.7,03 < 18.2,04 < 23.8, 05 < 29.3, oe < 33.9. 



2 Preliminaries 

Let G = (V, E) be the input graph with D C V being the set of demands. We 
use the terms vertices and nodes interchangeably. Also, depending up on the 
context, we use the term “demand” to denote a vertex or the flow out of it. Let 
Ce denote the length of edge e. We also use c^y to denote the length of an edge 
connecting nodes x and y. We use the metric completion of the given graph. Let 
Ui and ai denote the capacity and cost per unit length of cable type i. We define 
= cTi/ui to be the “incremental cost” of using cable type i. The value of Si can 
also interpreted as the cost per unit capacity per unit length of cable type i. Let 
us assume that each Ui and ai (and by definition Si) is a power of 1 -I- e, e > 0, 
which can be enforced by rounding each capacity Ui down to the nearest power 
of 1 -I- e, and each Oi up to the nearest power of 1 -I- e. This assumption is not 
without loss of generality, and can be accounted by losing a factor of (1 -I- e)^ in 
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the approximation ratio. We will choose e later. This idea of rounding is derived 
from [5], where they used powers of 2, thus effectively choosing e to be 1. 

The following properties on the costs and capacities of cable types have been 
known [4,5] . Without loss of generality, assume that the cables are ordered such 
that Ui < Uj and ai < cfj for all i < j. Note that if Ui < Uj and ai > aj, 
then we can eliminate cable type i from consideration. We can also assume that 
ui = (7i = 1, as this can be obtained by appropriate scaling, though it may leave 
non-integer weights at vertices. For each j < k, 



— 

Uk Uj 



( 1 ) 



Otherwise, cable type k can be replaced by Uk/uj copies of cable type j without 
increase in cost. The fact that Sj = Ojjuj is a power of 1 -l-e implies that Sj+i < 
Sj/{1 + e) for all j, since Uj+i > (1 + e)uj. Let gk = By equation (1), 

1 = ui < gi < U2 < 92 < ■ ■ ■ < Uk < 9k = oo. 



Since at is a power of 1 -I- e for any i, and iTj+i > aj, using equation (1) we get. 



Uj+i 



> 1 -k e. 



Let OPT denote an optimal solution with cost C* = where C*{j) 

is the amount paid for cable type j in OPT. We state the following lemma and 
its proof from [5], as its understanding is crucial for an easier understanding of 
our algorithms. 



Lemma 1 (Redistribution Lemma [5]). Let T he a tree rooted at r with eaeh 
edge having capacity U . For each vertex j G T, let w{j) < U be the weight located 
at j with w{j) a multiple ofU. Then there is an efficiently computable (ran- 
dom) flow on the tree that redistributes weights without violating edge capacities, 
so that each vertex receives a new weight w'{j) that is either 0 or U. Moreover, 



Pr [ w'{j) = U ] = w(j)/U 



Proof. Replace each edge in T with two oppositely directed arcs. Let T be a 
value chosen uniformly at random from (0, U]. Take an Euler tour of the vertices 
in T, starting from r and visiting all the other vertices {ji , j 2 , • ■ • , Jm} in T. Let a 
counter Q be set to 0 initially. On visiting vertex j^, we update Q ^ Q + w{jk). 
Also, let Qoid and Qnew be the value of v just before and after visiting jk, 
respectively. On visiting jk, if xU -\-Y G {Qoid,Qnew] for some integer x, then 
“mark” jk and ask that it send Qnew — {xU Y) weight to the next marked 
vertex lying clockwise on the tour. In the other case, we ask that jk send all its 
weight to the next marked vertex lying clockwise on the tour. This construction 
ensures that the maximum flow on an directed edge is at most U, and that 
the probability that a vertex j gets marked is w{j)/U, which is exactly the 
probability that j receives a weight of U. 
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Since we were working on a directed tour, the cost of this redistribution is at 
most twice the cost of the tree T, as an edge in T was replaced by two oppositely 
directed arcs. But, using simple flow canceling argument, one can show that one 
copy of the edges in T is sufficient for such a redistribution. I 

3 The Algorithms 

We first show how to modify the algorithm of Gupta, Kumar and Roughgar- 
den [5] to obtain an approximation ratio of ax, where ax is less than 65.49 for 
all K. Recall that their algorithm solves the DSSBB problem, and not the SSBB 
problem. We then present our main result, an approximation algorithm for the 
SSBB problem that achieves an approximation ratio of 145.6 in Section 3.2. 



3.1 The DSSBB Problem 

The vertices of the graph G = (V, E) may have non-integer weights as demands, 
because of the scaling done to make ui = a\ = 1. Since the flow is divisible, there 
is no loss of generality in assuming that dj < 1, because a demand greater than 

1 can be split into multiple demands by splitting a vertex into [dj] vertices. The 
algorithm is simpler with this assumption, and easily adapts to higher demands 
by adjusting the probabilities without actually splitting vertices. 

Construct a p-approximate Steiner tree Tq, using cables with capacity ui = 1, 
spanning all the demands in D. Redistribute the demands using the construction 
in the proof of Lemma 1, with [7=1, and collect integral demands at some subset 
of vertices in D. The cost incurred to do this redistribution is just the cost of the 
Steiner tree [5], and since the optimal solution contains a candidate Steiner tree, 
we incur a cost of at most p x C*{j)!aj. We can assume that the number 
of demands \D\ is a power of 1 -I- e, as this can be achieved by placing dummy 
demands at the root vertex r. 

The algorithm given below closely follows the incremental design of Gupta et 
al.’s algorithm [5] to build the network. The algorithm proceeds in stages using 
only cable types t and t -|- 1 in stage t, except for the last stage {t = K) in which 
only cables of type K are used. 

At the beginning of the first stage, Di = D with each demand j G D having 
weight dj = 1 = ui- In general, at the beginning of stage t, Dt is a set of \D\/ut 
vertices, each with demand Ut- During stage t, our algorithm (presented below) 
uses Ut+i as the “aggregation threshold” to combine several demands of weight 
Ut into a single demand of weight Ut+i- Unlike [5], where capacities are powers of 

2 which ensures that Ut+i is an integral multiple of of Ut, in our algorithm Ut+i 
is not necessarily an integral multiple of Uj. As a result, during the aggregation 
process our algorithm may have to combine demands of weight Ut from, say, 1.33 
vertices to obtain a demand of weight Ut+i- The cables required to perform such 
an aggregation are bought by the algorithm. The demand will reach the root at 
the end of the algorithm. The final solution is then given by the union of all the 
paths used in the aggregation stages. 
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Given below are the steps performed at stage t of the algorithm. Its first 
three steps are exactly the same as in [5] . Whenever we mention a fraction of a 
vertex, we mean a fraction of the demand from that vertex. 

Dl. Mark each demand in Dt with probability pt = ut/gt, and let be the 
marked demands. 

D2. Construct a p-approximate Steiner tree Tj on Ft = D^U {r}. Install a cable 
of type t + 1 on each edge of this tree. 

D3. For each vertex j € Dt, send its weight Ut to the nearest member of Ft using 
cables of type t. Let wt{i) be the weight collected at t G Ft. All vertices 
that sent their demands to the same vertex i are said to belong to i’s family, 
which we call as G^. 

D4. A vertex i G Ft collects the demands sent to it by all vertices in its family, 
Gi, divides it into groups of size Ut+i- Each member of Gi may partition 
its flow and contribute to at most two groups. Flow from a group g is sent 
back to a random vertex of g by buying a new cable of type t + 1. If the 
whole vertex belongs to g, then the probability that that vertex receives 
back a weight of Ut+i is Ut/ut+i- But if only a fractional part / of a vertex 
demand belongs to g, then the probability that that vertex receives back 
a weight of Ut+i is fut/ut+i- Some residual demand may be left over at i 
which will be aggregated into demands of Ut+i using redistribution in the 
next step. Let the number of residual vertices at ihe hi. 

D5. After rerouting the collected weight back from i to vertices in Dt in the 
above step for all i G Ft, we aggregate the weights from residual vertices 
into groups of weight exactly Ut+i using Lemma 1 with T = Tt, Wt{i) = biUt 
and U = Ut+\. For every i G Ft that receives Ut+i weight as a result of this 
aggregation process, send the weight back from i to one of Fs hi residual 
vertices, chosen uniformly at random, using newly bought cable of type 
t + 1. If the whole vertex is a residual vertex, then the probability that 
that residual vertex receives back the weight of Ut+i is 1/^i- But if only a 
fractional part / of a vertex demand is residual, then the probability that 
that vertex receives the weight of Ut+i is f /hi. In this scheme, a vertex j 
may receive back a weight of 2ut+\ in stage t as a, result of it being in two 
groups, which can be viewed as duplicating the vertex. 

At stage t, since the Ut+i demands from i G Ft for all i are returned back 
to a subset of vertices in Dt, Dt+i C Dt for all t. When t = K, we set pk = 0. 
Hence, in the Kth stage of the algorithm none of the demands are marked, and 
thus the weights of all vertices in Dk are sent directly to root r using cables of 
type K. We use the following lemmas to analyze our algorithm. 

The lemma below appears as Lemma 4.2 in [5]. But its proof in [5] is not 
directly applicable to our algorithm, because the value of Ut+i in our algorithm 
is not necessarily an integral multiple of the value of Ut, for all t. 

Lemma 2. For every non-root vertex j G D and stage t 



Pr[j G Dt] = l/ut. 
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Proof. We prove the lemma by induction on t. The lemma is clearly true for the 
base case, t = 1, since u* = 1. Suppose the lemma is true for stage t. We will 
show that it is true for stage t+1. In stage t, let j £ Dt send its weight to i £ Ft- 
Vertex j must satisfy one of the following conditions: (i) j belongs to just one 
group, (ii) j belongs to two different groups, (iii) j is fully a residual vertex, and 
(iv) part of j belongs to a group while the rest of j is residual. Recall that a 
vertex can belong to at most 2 groups. 

In case (i), the probability that j receives back the group weight of Ut+i is 
Utlut+i- In case (ii), let a fraction / of j belong to group gi and the rest of 
j belong to group (72 • The probability that j receives back (/I’s weight of Ut+i 
is fut/ut+i, while the probability that j receives back 52 ’s weight of Ut+i is 
(1 — f)ut/utt + 1. Overall, the probability that j receives back a weight of Ut+i 
is ut/ut+i- In case (iii), the probability that i is assigned the weight of Ut+i is 
biUt/ut+i, and the probability that j receives this weight from i is 1/bi, thus 
making the overall probability that j receives a weight of Ut+i to be Ut/ut+i- 
By a similar argument, it is clear that the probability that j receives a weight 
of Ut+i in case (iv) is Ut/ut+i- Thus, we conclude that 

Pr [j£ Dt+i ] = Pr [j £ Dt+i \ j £ Dt] ■ Pv [ j £ Dt] 

= {ut/ut+i){l/ut) 

= l/ut+i. 



The following lemma, proved by Gupta, Kumar and Roughgarden [5] , applies 
to our algorithm as well. The proof involves taking all cables of higher capacity 
used by an optimal solution, and then extending it using randomization to span 
Ft, and showing that this solution has low expected cost. 

Lemma 3 ([5]). Let Tf be the optimal Steiner tree on Ft, and c(Tf) = 
Then 

E [ c(T*) ] < ^ -C*(s) + ^ ^C*(s). (2) 

s>t s<t 



Lemma 4. The expected cost incurred in stage t is at most (2 + p+ j^) times 
at+iEi[c{T/)], where Tf is the optimal Steiner tree on Ft- 

The proof of the above lemma is given as Lemma 4.4 in [5] with e = 1. Cost 
incurred during stage t is accounted for as follows: (i) the cost of the cables to 
construct the Steiner tree in Step D2 is at most pat+ic{Tf), (ii) the cost of the 
cables used in Step D3 is at most 2at+ic{Tf), and (iii) the cost incurred in Steps 
D4 and D5 to reroute the demands back to random vertices in Dt is at most 

(^).2a,+ic(T,*) < (^).2a*+ic(T/). 
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Theorem 1. The approximation ratio ax of our DSSBB algorithm is at most 
6549. 

Proof. Recall that by rounding the costs and capacities of cables to powers of 
(1 + e), we lost a factor of (1 + e)^ in the approximation ratio. We incurred a 
cost of 

3 

for the construction of Steiner tree Tq to ensure integral demands at vertices. The 
total cost Cg incurred during the K stages of the algorithm can be obtained by 
substituting equation (2) in Lemma 4 and summing over all t, as shown below. 



Cs 4: 2 + p + 



1 + e 



K s-1 






> t>l 



t>s 



The cost of the final solution is given by 
C = A{Ct, + Cg) 

< ^ f (y + 1: ^ w 



K s-1 



s—2 



f>s 



where A = (1 + e)^. Since at and 5t are powers of 1 + e, the summations are 
upper bounded by 1+1/e. This simplifies the above equation to 

C < (1 + e)^^2 + p + ^ e) ^ (^) 

S 

which when optimized for e gives a ratio of 65.4899 for e « 0.585735. Here we 
are using the current best approximation algorithm for finding a Steiner tree, 
which guarantees an approximation ratio of p = 1 + ln(3)/2 [7]. ■ 



Corollary 1. For a fixed K, «2 < 12 . 7,03 < 18.2,04 < 23.8,05 < 29.3, Og < 
33.9 and so on. 



Proof. The cost of the final solution C = (1 + e)^(C'To + Cg) can be rewritten as 

K 



C<{l + ef 



T. /-c-w 



+ I 2 + + 



14-e 



K s-1 



\t>l s=2 t=l t>s 



For a fixed K, there exists an e > 0 for which the corollary can be mathematically 
verified. ■ 
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3.2 The SSBB Problem 

During the preprocessing step, Gupta et al.’s algorithm [5] and our DSSBB algo- 
rithm in Section 3.1 use redistribution on Tq to guarantee integral demands at 
the vertices. Later, vertices of integral demands are duplicated so that the de- 
mands at vertices are unit weight. Because of this redistribution and duplication, 
there is no guarantee that the demand from a vertex in the input graph travels 
along a single path to the sink, as the demand at a vertex may have been split 
during the redistribution and/or duplication process. In our algorithm below, we 
make sure that demand at a vertex follows a single path to the sink. Like [5], we 
set e = 1, which makes Ui and ai (and by definition Si) powers of 2. This gives 
us the flexibility of generating Wi+i weighted nodes from integral number of Ui 
weighted nodes, thereby eliminating splitting of demands. 

In what follows, we present a sequence of lemmas, which help in guarantee- 
ing the indivisibility constraint. Recall that Lemma 1 redistributes the weights 
uniformly at random and the probability that a vertex receives a weight of C/ is 
proportional to its weight. 

Lemma 5. Either there exists at least one arc with zero flow in the directed tour 
t constructed in procedure of of Lemma 1, or there exists a redistribution (using 
Lemma 1 ), with zero flow on at least one arc of the directed tour, which produces 
the exact same assignment of weights. 

Proof. The proof is complete if the first part of the lemma were true. Suppose 
it were not true. Let t be the directed tour in the procedure of Lemma 1, which 
was used to redistribute the weights. Let m > 0 be the smallest flow across a 
directed edge in t. Note that m < U. For each directed edge in t, subtract m 
from the flow on that edge. After this, we are guaranteed that at least one edge 
in t has a zero flow. Since this post-processing does not alter the distribution of 
weights, the proof is complete. ■ 

Lemma 6. There exists a redistribution using the procedure of Lemma 1 with 

Y = U , which produces the exact same assignment of weights as that with Y that 
is chosen uniformly at random from (0, U]. 

Proof. Let t be the directed tour in the proof of Lemma 1. As per Lemma 5, 
there exists at least one edge in t with zero flow. Let e be an edge in t from 
vertex p to vertex q with zero flow. Without loss of generality, we can assume 
that p G D. As per the construction in the proof of Lemma 1, p must be one of 
the vertices that must have been marked. Since the flow on e is zero, it must be 
that Qnew at P is equal to xU + Y for some integer x, which means that vertex 
g marked just after p must either have (x + 1)U + Y & {Qoid at g, Qnew at g] or 

Y € {Qoid at g, Qnew at g]. This means that Qnew at g is at least U greater than 

Qnew at p. 

Recall from the proof of Lemma 1 that the vertices in t are visited starting 
from r. We now show that a construction with Y = U on t, visiting vertices 
starting from q (instead of r) produces the exact same assignment of weights as 
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that with Y that is chosen uniformly at random from (0, C/]. From the above 
discussion, since Qnew at g is at least U greater than Qnew at p, and the flow 
on e is zero, it can be seen that the construction with Y = U on t and visiting 
vertices starting from q produces the exact same outcome as what is desired, 
i.e., the set of vertices that were assigned a weight of U will exactly be the same 
as that marked in the proof of Lemma 1. ■ 

The following lemma is easier than it appears, and differs from Lemma 1 in 
the following two aspects: (i) weights of vertices in T are powers of 2, and (ii) 
demand from a vertex is not split. 

Lemma 7. Let T he a tree with each edge having capacity U , a power of two. 
For all V in tree T, let w{v) he a power of 2 with w{v) < U. Then there is 
an efficiently computahle flow on T that redistributes the weights, respecting the 
cable capacity and without splitting a vertex weight, so that each vertex receives 
a new weight w'{j) that is either 0 or U . Moreover, 

Pr [ w'{j) = U ] = w{j)/U 

Proof. Using the argument in Lemma 6, we find a starting vertex from which we 
start visiting the vertices in the directed tour (obtained by replacing each edge 
in T with two oppositely directed arcs) in clockwise direction with Y = U . The 
value of Q is set to 0 initially. Increment Q by w{j) on visiting vertex j. Let Qoid 
and Qnew be the value of Q just before and after visiting a vertex, respectively. 
Also, maintain set Z which is initially empty. Add Vj to Z on visiting vertex Vj. 
On visiting j, if for some integer x, xU G {Qoid, Qnew], then we do the following: 
(i) we find W C Z such that Qnew — xU = (ii) ask the vertices 

in Z\W to send their weights to j while removing them from Z . 

We now show how to find W Z. Let g be the first vertex at which Qnew > 
U . The proof of Lemma 6 would have marked g and asked g to send Qnew — U to 
the next marked vertex lying clockwise on the tour. We show that there exists a 
W C Z whose removal from Z makes Qnew — Siew same as 

showing that there exists a set M C Z such that R-Scall that 

no vertex in Z has a weight more than U. To show that there exists an M, all 
we need to do is the following. Merge two vertices a and b of same weight w in 
Z into one vertex with weight 2w. Since w is a power of 2, the weight of the new 
vertex remains a power of 2. Continue this merging process until (i) a vertex in 
Z is of weight U or (ii) no more merging is possible. While the former proves 
our claim, the latter is not possible as it is a contradiction to 
because ^ 2*+^. Once M is found, W = Z\M. The vertices in W will 

be the sole contributors of the flow from g to the next vertex lying clockwise on 
the tour. This argument holds true for every vertex j at which xU € {Qoid, Qnew] 
for some integer x. Notice that the probability that a vertex j G T receives (gets 
assigned) a weight of U is w{j)/T, which is exactly what we needed, as per the 
lemma statement. 

The proof will be complete once we show that the redistribution can be done 
on T rather than on the directed arcs of the Euler tour on T. Consider a leaf 
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h"'0^ 



fh': 






-O' 



fl"h" 

vertex h vertex I 



Fig. 1. I is a leaf node with h being its parent in T. 



node I G T that is in the Euler tour. Let h G T he the node that was visited 
just before and after I {h is I’s parent in T, which is rooted at r). We use x' and 
x" to represent vertex x G T in the directed Euler tour, with the tour entering 
x' and leaving x" . Let fh'i' and fi"h" be the flows on arcs from h' to I' and V 
to h", respectively (see Fig. 1). During the redistribution process, if I had sent 
all its weight to some vertex — lying clockwise on the tour — that was assigned a 
weight of [/, then ask h' to send the flow fh'i' directly to h” instead of sending it 
through Z. If ^ was assigned a weight of U in our redistribution process, then ask 
the vertices in W to reroute their flow bypassing I, i.e., make the flow coming 
into h' go directly to h” instead of routing it through 1. Remove I from T, and 
repeat this process for all leaf nodes in T. Note that whenever a leaf node is 
removed from T, the flow on the tree edge connecting that node to T is at most 
U. This process stops when there is just one node left in T. This completes the 
proof of Lemma 7. ■ 

Let G = {V, E) be the input graph with root r gV, and let D C E be the set 
of demands with dj denoting the weight at j. Recall that the vertices in D may 
have non-integer weights because of the scaling we did to ensure iti = cti = 1. 
Construct Tq, a p-approximate Steiner tree spanning D, using cables of capacity 
ui- Use the procedure in the proof of Lemma 1 on Tq with U = u\, with w{j) 
being the fractional part of dj G D, to collect demands at some subset of vertices 
in D such that the new weight w'{j) of a vertex in D is either 0 or Z7. The cost of 
the redistribution will just be the cost of Tq. Since an optimal solution contains 
a candidate Steiner tree, the cost of Tq is at most pJ2i 

As per the redistribution procedure, notice that (i) weight w(j) of vertex 
j G Tq may have been split and assigned to at most two different nodes in Tq, 
and (ii) the weight of U is collected at a vertex j G D if and only if w(j) > 0, 
as the probability of a vertex getting assigned a weight of U is w{j)/U (by 
Lemma 1). The former point is not consistent with our objective of routing 
the demands without having to split them across two nodes. To overcome the 
splitting, we round the integral demands at D up to the nearest powers of 2, 
and solve the problem for these new (rounded) weights. Even though, this means 
that we might install at most twice the required cable capacities, thereby losing 
a factor of 2 in the approximation ratio, we will have enough cable capacities 
installed so as route the original demands without having to split them. 

Now, replace vertex v G D oi weight w{v) by w(v) unit weight vertices. Let 
{vi, . . . be the set of unit weight vertices that represent v. We call v to 

be the origin of f = 1 to rc(r;). Our algorithm will ensure that the unit weight 
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demands having a common origin travel together — along a single path — towards 
the sink. 

The algorithm given below proceeds in the same manner as that in [5]. At 
the beginning of stage 1, Di = D with each demand j G D having weight 
dj = 1 = ui- In general, at the beginning of stage t, Dt is a set of \D\/ut 
vertices, each with demand Ut- During stage t, our algorithm (presented below) 
uses the value Ut+i as the “aggregation threshold” to combine several demands 
of weight Ut into a single demand of weight rtt+i- The cables required to perform 
such an aggregation are bought by the algorithm. The demand will reach the 
root at the end of the algorithm. The final solution is then given by the union of 
all the paths used in the aggregation stages. Given below are the steps performed 
at state t of the algorithm. 

51. Mark each demand v in Dt with probability pt = ut/gt, and let be the 
marked demands. 

52. Construct a p-approximate Steiner tree Tt on Ft = U {r}. Install a cable 
of type t + 1 on each edge of this tree. 

53. For each vertex j G Dt, send its weight w{j) to the nearest member of 
Ft using cables of type t. If two vertices have a common origin, ensure 
that both vertices send their weight to the same i G Ft, as this guarantees 
that vertices having a common origin travel together, thus satisfying the 
indivisibility constraint. Let Sy be the set of vertices, with common origin v, 
that sent their weights to i G Ft. Let Wt{i) be the weight collected at i G Ft- 

54. For each i G Ft, order the vertices that sent their weight of Ut to i in 
such a way that the vertices in Sy are ordered before the vertices in Sy, if 

> |5«|. 

Divide the vertices in the ordered set into groups of ut+i/ut vertices, starting 
from the first vertex, leaving behind bt = ( mod residual vertices 

at the end. Send back the weight of Ut+i emanating from each group of 
ut+ijut vertices back from i to a random member of that group, buying 
new cables of type t + 1. Since Ut,Ut+i and jAfej, for all k, are powers of 2 
by definition, our construction ensures the following: (i) set Sk, with \Sk\ > 
ut+i/ut, is divided into p groups, where p > 1 is a power of 2, (ii) set Sk, 
with jS'fcl < ut+i/ut belongs to exactly one group. This implies that vertices 
with common origin travel together. 

55. For each i G Ft, divide the bi residual vertices into qi sets R],. . . , Rf, with 
each set containing vertices having common origin, and the weight w{Rl) of 
a set Rl being the number of vertices it contains. Let Fj. = (p initially. For 
each i G Ft, if qt > 1, then add qt copies of i into F[, one for each set, with 
each copy carrying a weight of the sum of the vertex weights in the set that 
it represents. Observe that the weights of the vertices in F[ are powers of 2. 
Also, note that Tt spans all the vertices in F/, as the vertices in F)' are mere 
copies of the vertices in Ft- Use the procedure of Lemma 7 on Tt spanning Fj. 
with U = Ut+i to aggregate residual weights into groups of weight exactly 
Ut+i in a subset of vertices in F/. During the redistribution procedure, for 
every i G Ft, ensure that its copies in F[ are visited consecutively. This, along 
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with the fact that UtY^'^j=iw{Rl) < Ut+\ for every i & Ft ensures that at 
most one copy of i in representing i G Ft, gets assigned a weight of Ut+i- 
Transform the redistribution among the vertices in F^ into a redistribution 
among the vertices in Ft by assigning a weight of Ut+i to vertex i G Ft if 
one of i’s copy was assigned a weight of Ut+i in F[, and 0 otherwise. Notice 
that the probability that a vertex i G Ft is assigned a weight of Ut+i still 
depends on i’s weight (residual weight, which is biUt). For every i G Ft, that 
receives a weight of Ut+i, choose a vertex v G bi uniformly at random, and 
send the weight of Ut+i from i to u using cables of type t + 1. 

When t = K, we set the probability for non-root vertices pk = 0, which 
implies that no vertex in stage t = K is marked. The weights from all the vertices 
in Dk are directly routed to r using cables of capacity K. The approximation 
analysis for our SSBB algorithm is exactly the same as that for the Gupta et 
al.’s DSSBB algorithm [5]. All the lemmas used to prove Theorem 1 hold for 
our SSBB algorithm as well, but with e = 1. Recall that after the preprocessing 
step, we lose a factor of 2 from rounding up the weights of vertices to the nearest 
powers of 2. This means that our algorithm for the SSBB problem guarantees a 
ratio of twice that of Gupta et al.’s DSSBB algorithm. The cost C of our final 
solution is 2 times the cost in equation (3), and is given by 

C < 2 X 4 X (2 -k p -k 1) X 2(1 + 1) ^ C*(s). 

S 

Using the current best approximation ratio of p = 1 -I- ln(3) /2 [7] for finding a 
Steiner tree, we obtain a ratio of 145.6. 

Theorem 2. Our algorithm for the SSBB problem guarantees an approximation 
ratio of 145.6. 
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Abstract. We propose an approximation algorithm for the problem of 
finding a maximum stable matching when both ties and unacceptable 
partners are allowed in preference lists. Our algorithm achieves the ap- 
proximation ratio 2 — for an arbitrarily positive constant c, where 

N denotes the number of men in an input. This improves the trivial 
approximation ratio of two. 



1 Introduction 

The stable marriage problem is a matching problem first introduced by Gale and 
Shapley [4]. An instance of this problem consists of N men, N women and each 
person’s preference list. A preference list is a totally ordered list including all 
members of the opposite sex depending on his/her preference. For a matching 
M between men and women, a pair of a man m and a woman w is called a 
blocking pair if (i) m prefers w to his current partner and (ii) w prefers m to her 
current partner. A matching with no blocking pair is called stable. The stable 
marriage problem is to find a stable matching for a given instance. Gale and 
Shapley showed that every instance admits at least one stable matching, and 
they also proposed so-called the Gale-Shapley algorithm to find one, which runs 
in O(fV^) time [4]. 

However, considering an application to a large-scale assignment system, it 
is unreasonable to force agents to write all members of the other party in a 
strict order. Hence two natural relaxations are considered: One is to allow for 
indifference [6,11], in which each person is allowed to include ties in his/her 
preference. When ties are allowed, the definition of stability needs to be extended. 
A man and a woman form a blocking pair if each strictly prefers the other to 
his/her current partner. A matching without such a blocking pair is called weakly 
stable (or simply “stable”) and the Gale-Shapley algorithm can be modified to 
always find a weakly stable matching [6]. The other one is to allow participants 
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to declare one or more unacceptable partners. Thus each person’s preference 
list may be incomplete. Again, the definition of a blocking pair is extended, so 
that each member of the pair prefers the other over the current partner or is 
currently single and acceptable. In this case, a stable matching may not be a 
perfect matching, but all stable matchings for a fixed instance are of the same 
size [5] . Hence, finding a maximum cardinality stable matching is trivial. 

However, if both ties and incomplete lists are allowed, one instance can admit 
stable matchings of different sizes, and it is known that the problem of finding a 
maximum stable matching, which we call MAX SMTI (MAXimum Stable Mar- 
riage with Ties and Incomplete lists), is NP-hard [14,17]. For approximability, 
it is easy to see that two stable matchings for the same instance differ in size 
by at most a factor of two (see Theorem 5 of [17], for example). Since a stable 
matching can be found in polynomial time by a modified Gale-Shapley algo- 
rithm, existence of an approximation algorithm with a factor of two is trivial. 
Very recently, [9] presented several approximability upper bounds which are sig- 
nificantly better than two for restricted inputs, such as a factor of for 

instances where length of ties is at most L and ties appear in only one sex. 



Our Contribution. In this paper, we give the first nontrivial approximability 
result for generalMAX SMTI. Namely, our new algorithm, based on local search, 
achieves an approximation factor of 2 — where c is an arbitrarily positive 

constant. From an initial stable matching, our algorithm successively improves 
the size of the solution. While the size of the current solution is at most + 
c log N where OPT is the size of an optimal solution, we can increase the size 
by at least one. Hence, we finally obtain a stable matching of size greater than 
^ -kclogW 



Related Results. There are several examples of using the stable marriage 
problem in assignment systems. Among others, one of the most famous appli- 
cations is to assign medical students to hospitals based on the preference lists 
of both sides. For example, more than 30,000 applicants are enrolled in the hos- 
pitals/residents matching system in the U.S., which is known as NRMP [6,16]. 
In Japan, this kind of matching system came into use since 2003, where more 
than 95 % of 8,000 applicants obtained their positions in its first year. Other 
examples are CaRMS in Canada and SPA in Scotland [12,13]. Another famous 
application is to assign students to schools in Norway [3] and Singapore [20]. 

Up to now, there have been a lot of efforts to obtain approximability and 
inapproximability results for MAX SMTI. For inapproximability, MAX SMTI 
was shown to be APX-hard [7], and subsequently, a lower bound 21/19 on the 
approximation ratio (under the assumption that Py^NP) was presented [9]. This 
lower bound holds for restricted instances where ties appear in only one sex, the 
length of ties is two, and each person writes at most one tie. For approxima- 
bility, there are some approximation algorithms with factor better than two for 
restricted inputs, in which mainly restrictions are done in terms of occurrence 
of ties and/or lengths of ties [17,8,9], as mentioned previously. 
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There are several optimization problems that resemble MAX SMTI, where 
designing a 2-approximation algorithm is trivial but obtaining a (2 — e) - approx- 
imation algorithm for a positive constant e is extremely hard, such as Minimum 
Vertex Cover (MIN VC for short) and Minimum Maximal Matching (MIN MM 
for short). As is the case with MAX SMTI, there are a lot of approximability 
results for these problems by restricting instances. For example, MIN VC is ap- 
proximable within 7/6 if the maximum degree of an input graph is bounded 
by 3 [2], or within 2/(1 -|- e) if every vertex has degree at least e|V| [15]. For 
MIN MM, there is a (2 — l/d)-approximation algorithm for regular graphs with 
degree d [21], and PTAS for planar graphs [19]. For general inputs, (2 — o(l))- 
approximation algorithms are presented for MIN VC, namely, (2— ^ 

(2-(l-o(l))^^(^) [18,1,10]. 

2 Preliminaries 

In this section, we formally define MAX SMTI and approximation ratio of ap- 
proximation algorithms. 

An instance I of MAX SMTI consists of N men, N women and each person’s 
preference list that may be incomplete and may include ties. If a person p writes 
a person q in his/her list, we say that q is acceptable to p. Let m be a man. If 
m strictly prefers Wi to wj in I, we write Wi )^rn Wj- If Wi and Wj are tied in 
m’s list, we write Wi =m wj. The statement Wi Wj is true if and only if 
Wi'^m Wj or Wi =m Wj. We use a similar notation for women’s preference lists. 
Let M be a matching. If a man m is matched with a woman w in M, we write 
M{m) = w and M(w) = m. We say that m and w form a blocking pairior M (or 
simply, (m, w) blocks M) if the following three conditions are met: (i) M(m) yf w 
but TO and w are acceptable to each other, (ii) w )^rn M{m) or to is single in 
M. (iii) TO M{w) or w is single in M. For a matching M, BP{M) denotes 
the set of all blocking pairs for M. A matching M is called stable if and only if 
BP{M) = 0. MAX SMTI is the problem of finding a largest stable matching. 

A goodness measure of an approximation algorithm T of a maximization 
problem is defined as usual: the approximation ratio of T is max.{opt{x) /T{x)'\ 
over all instances x of size N , where opt{x) and T{x) are the size of the optimal 
and the algorithm’s solution, respectively. 

3 Overview of Algorithm LocalSearch(J) 

Here we give an overview of our algorithm LocalSearch. We need two param- 
eters k and c, which are fixed constants such that c < ^. LocalSearch takes 
an input / of MAX SMTI and uses two subroutines. Increase and Stabilize. 

Increase takes a stable matching M for I and a subset S of M such that 
[S'! = fclogA^. It outputs a (not necessarily stable) matching Mq such that 
[MqI > |M|, and for any blocking pair {m,w) G BP(Mq), either m or w (or 
both) is single in Mg. Increase may fail to find such a matching. In such a 
case, it returns an error. 
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Stabilize takes a (not necessarily stable) matching Mq where, for any block- 
ing pair (rn,w) € BP{Mq), either m or w (or both) is single in Mq. It outputs 
a stable matching of size at least \Mq\ (Lemma 10). 



Algorithm LocalSearch(J) 

1: M: = arbitrary stable matching for /; 

/* This can be done in polynomial time by arbitrary tie-breaking 
and applying the Gale-Shapley algorithm. */ 

2: while (true); 

3: {select (fc -I- 4c) log A edges from M in an arbitrary way, 

and let P be the set of selected edges; 

4: let Pi, P 2 , • • • , P„ be all subsets of P of size fclog A; 

5: for i := 1 to n 

6: Mi := Increase (M, Pi ); 

/* If Increase returns an error, let Mi be empty. * / 

7: if (there is an Mi such that |Mi| > \M\) 

8: Mo := Mi\ 

9: else 

10: terminate and output M; 

11: M ■— Stabilize (Mq); 

12 : } 



Fig. 1. Algorithm LocalSearch 



The full description of LocalSearch is given in Fig. 1. One can see that 
application of the while-loop increases the size of stable matching by at least 
one. This process can continue as long as the condition at line 7 is true. Later, 
we show that this is the case if (1) an input S for Increase has some “nice” 
property, and (2) \M\, the size of the input stable matching for Increase, is 
at most + clog A (Lemma 4), where OPT denotes the size of a maximum 
stable matching, and c is a constant defined above. Furthermore, we show that, 
among Pi, P 2 , • • • , P„ obtained at line 4, there is at least one “nice” Pi if |M| < 
^=^1^ + clog A (Lemma 3). Hence, we have the following theorem: 

Theorem 1. Given an SMTI instance I of size N, LocalSearch outputs a 
stable matching of size more than -I- clog A in time polynomial in A. 

Since constants c and k can be set arbitrarily large, we have the following 
corollary. 

Corollary 1. For any positive constant c, there is a polynomial-time approxi- 
mation algorithm for MAX SMTI with approximation ratio at most 2 — . 

Before showing Increase and Stabilize, we prove an important property 
of Pi, P 2 , ■ ■ ■ , Pn obtained at line 4 of LocalSearch. 
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Let us fix an optimal solution Mopt, a largest stable matching for / (which 
we do not know of course). Given a stable matching M for J, let us define the 
following bipartite graph GMoptM- Each vertex of GMoptM corresponds to a 
person in I. There is an edge between vertices m and w if and only if Mopt{m) = 
w or M{m) = w. If both Mopt{m) = w and M{m) = w hold, we give two edges 
between m and w] hence GMpptM i® ^ multigraph. An edge (m,w) associated 
with Mopt{m) = w is called an OPT-edge. Similarly, an edge associated with 
M{m) = w is called an M-edge. Observe that the degree of each vertex is at 
most two, and hence each connected component of GM„pt,M is a simple path, a 
cycle or an isolated vertex. 

Let us partition M-edges of GMpptM good edges and bad ones. If an 
edge is in the path of length three starting from and ending with OPT-edges, 
then it is called good. Otherwise, it is bad. We also call an edge in M good (bad, 
respectively) if that M-edge in Gmop±,m is good (bad, respectively). 

Lemma 1. Let {m,w) be a good edge of M . Then, w Mopt{m) and m 

Mopt(w). 

Proof. If Mopt{m) )~rn w, then {m, Mopt{m)) is a blocking pair for M, which 
contradicts the stability of M. So, w >rn Mopt{m). For the same reason, m hw 

Mopt(w). □ 



Lemma 2. Let t be an arbitrary positive integer. Lf \M\ < -g then the 

number of bad edges in GMpptM most 4t. 



Proof. First of all, we show that there is no path of length one in GMoptM- This 
can be seen as follows: Suppose that there is a path of length one, say 
and suppose that this is an OPT-edge. Then m and w write each other on the 
preference list since they are matched in Mop±. However, both of them are single 
in M . This means that (m, w) is a blocking pair for M, which contradicts the 
stability of M . When (m, w) is an M-edge, we can do a similar argument to have 
a contradiction. 

Consider then each connected component C of GMopt,M- Let R{G) be the 
ratio of the number of OPT-edges to the number of M-edges in C. If C is a 
cycle, then it contains the same number of OPT-edges and M-edges, and hence, 
R{C) = 1. This is same if C is a path of even length. If C is a path of odd 
length starting from and ending with M-edges, R{G) < 1 since the number of 
M-edges in G is more than that of OPT-edges. If C is a path of length three 
starting from and ending with OPT-edges, then the M-edge it contains is good 
and R{G) = 2. If C is a path of length more than three starting from and ending 
with OPT-edges, then R{C) <3/2. 

Now, suppose that there are G good edges and G bad edges. Then, the 
number of OPT-edges, namely \Mopt \ is at most 2G + ^G by the above argument. 



Since + G 



\M\ and |M| < 



I Mopt I 



-I- t, we have that G < 4t. 



□ 



2 
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Lemma 3. If \M\ < + clog then there is at least one i such that Pi 

contains only good edges. 

Proof. Recall that |F| = {k + 4c)logiV. Since there are at most 4c log bad 

edges in M as proved in Lemma 2, P contains at least fclog A^ good edges. Since 
we output all subsets of size fclog A^, there must be Pi with only good edges. □ 



4 Procedure Increase(M, S) 

Recall that Increase takes a stable matching M and its subset S of size k log N 
as an input, and outputs a matching, say M', where \M'\ > \M\. M' may not 
be stable for / but it satisfies the property that for any blocking pair (m, w) G 
BP(M'), either m or w (or both) is single in M'. Before going to the detail, we 
roughly explain the execution of Increase. 

In the following, we assume that S consists of only good edges. (As proved in 
Lemma 3, there is one way of receiving such S' if |M| < \ffard _|_ ^log N.) Given 
S, let Si be a subset of S whose size is |S|/4. Since each edge in Si is good, 
for each person p in Si, his/her partner in Mopt is single in M . We divorce all 
couples of Si , and then, make them to find a partner who is single in M. They 
may not find the partner in Mopt, but if we try all possible Si , at least one choice 
will give us a good result, i.e., every person in Si finds a partner who is at least 
as good as the partner in Mopt (Lemma 5). Let L be the set of newly added 
edges. Then, it is not hard to see that \L\ = 2|S'i|, and hence we can increase 
the size of M by \ Si \. (See Fig. 2 (a).) 

In the latter half of the algorithm, we do the following: If there is a blocking 
pair (m, w) such that both m and w have a partner, say, w' and m', respectively, 
then, we can prove that exactly one of (m, w') or {m' , w) is in L. We then remove 
one which is not in L. (See Fig. 2 (b).) This process may decrease the size of a 
matching, but we prove that its size decrease is less than In total, we can 
increase the size of matching at least by one. The full description of algorithm 
Increase is given in Fig. 3. 



M 





(a) 



(b) 



Fig. 2. Execution of Increase 
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Procedure Increase(M, S) 

1: E"* := set of all single men in M; F™ set of all single women in M; 

2: let Si,S 2 , - ■ ■ ,Sn be all subsets of S of size |S'|/4; 

3: for i := 1 to m 

4: := set of all men in Si\ S'™ := set of all women in St; 

5: Find a matching between S™ and E™ 

using the men-propose Gale-Shapley algorithm; 

(To do this, remove all persons not in SJ" U E™ from each person’s list, 
and break all ties arbitrarily.) 

6: Find a matching between S™ and E"“ 

using the women-propose Gale-Shapley algorithm; 

(To do this, remove all persons not in S™ U E™ from each person’s list, 
and break all ties arbitrarily.) 

7: if {3p s.t. p € S™ U S™ and p remains single after the Gale-Shapley 

algorithm) 



8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 



exit for-loop; /* the current i was not good choice */ 
else 

{L the set of all pairs obtained by the Gale-Shapley algorithm; 
Mi~M-SiU L; 

while (3(m, w) € BP{Mi) s.t. both m and w have a partner in Mi) 
{if ( {m, Mi{m)) G L and {Mi{w),w) G L ) 

exit for-loop; /* the current i was not good choice */ 
if ( (m, Mi(m)) € Mi — L and {Mi{w),w) £ Mi — L ) 
exit for-loop; /* the current i was not good choice * / 
if ( (m, Mi(rn)) € Mi — L and {Mi{w),w) G L ) 

Mi := Mi — {{m, Mi{m))}; 
if ( (m, Mi{m)) G L and (Mi(ui), w) G Mi — L ) 

Mi -.= Mi - {{Mi{w),w)}; 

} j* end while * / 

if ( \Mi\ > \M\ ) 

output Mi and terminate; 

else exit for-loop; /* the current i was not good choice * / 

} /* end else * / 

} /* end for */ 

output “error” and terminate; 



Fig. 3. Procedure Increase 



4.1 Correctness of Increase 

We give a sufficient condition for Increase to achieve a successful computation. 

Lemma 4. If S consists of only good edges, and if \M\ < N , then 

there is at least one way of selecting i such that Increase succeeds. 

The proof of this lemma uses a series of lemmas. In the following lemmas, 
we assume assumptions in Lemma 4, namely, S consists of only good edges, and 
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\M\ < clogN , even if they are not explicitly stated in the statement of 

each of following lemmas. 

Lemma 5. There exists i* such that, after executing the Gale-Shapley algorithm 
(at lines 5 and 6 of Fig. 3), every person in S™ U Sfl is matched with a partner 
who is at least as good as his/her partner in Mopt- 

Proof. Consider the following procedure. (Note that we consider this procedure 
only for the proof of this lemma. This procedure cannot be performed by al- 
gorithm Increase since it does not know Mopt-)' Let S'"* and S*" be sets of 
all men and women in S, respectively. Modify preference lists of all persons in 
S"* U S™ U F™ U F™ in the same way as in the execution of Increase. Further- 
more, in each man m(G S"*)’s list, remove all women strictly below Mopt{m). 
Similarly, in each woman S"')’s list, remove all men strictly below Mopt(w). 
It should be noted that for any person p in S"* U S™, Mopt{p) is in F™ U F™ 
since any element of S is a good edge, and hence is not removed from p’s list. 

Apply the men-propose Gale-Shapley algorithm to the subinstance defined 
by S’" and F*". It is not hard to see that at least half of S’" are matched at 
the termination of the Gale-Shapley. To see this, suppose the contrary, and let 
A C S’" be the set of single men (|A| > |S’"|/2). Then, each man m in A 
is rejected by Mopt{m). (Recall that Mopt{m) is in m’s list.) When Mopt{m) 
rejected m, Mopt(jn) was matched with someone better than m, and during the 
execution of the Gale-Shapley algorithm, she never becomes single. So, at the 
termination, more than |S’"|/2 women are matched but this means that more 
than I S’" 1/2 men are matched, a contradiction. 

Now, if m G S’" has a partner after the execution of the Gale-Shapley al- 
gorithm, call m a successful man. Gall a woman in S’" a successful woman if 
and only if her partner in M is a successful man (there are at least |S|/2 suc- 
cessful women). Now apply the women-propose Gale-Shapley algorithm to the 
subinstance defined by all successful women in S’" and F”*. If, in the resulting 
matching, a successful women gets a partner, call her a super-successful woman. 
For the same reason as above, at least half of all successful women are super- 
successful. Gall a pair (m, w) € S a, super-successful pair if and only if ic is a 
super-successful woman. There are at least |S|/4 super-successful pairs. 

Since Si, S 2 , - ■ ■ , Sn are all subsets of S with size exactly jFI/d, there exists 
at least one i such that Si consists of only super-successful pairs. Let i* be one 
of such i. It is not hard to see that after Increase completes the Gale-Shapley 
algorithm (of lines 6 and 7), each person in 5™ and Sfi is matched with at least 
as good a partner as one obtained by the above procedure. This completes the 
proof. □ 

In the following lemmas, i* always denotes the one that satisfies the condition 
of Lemma 5. 

Lemma 6. Mi* at line 11 of Fig. 3 satisfies following (1) and (2): (1) \Mi* \ = 
\M\ -\- |logfV. (2) Consider an arbitrary blocking pair (m,w) G BP{Mi*) such 
that both m and w are matched in Mi*. Then, exactly one of {m, Mi*(jn)) and 
{Mi*{w),w) is in Mi* — L and the other is in L. 
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Proof. (1) Recall that |S'i*| = |S'|/4 = |logA^ and \L\ = 2|S'j. |. Then, \Mi* \ = 
\M\ - I + |L| = |M| + I = |M| + I log N. 

(2) First, suppose that both {m, Mi*{m)) and {Mi»{w),w) are in Mi* — L. 
Observe that, by the construction of Mi*, both of these two pairs are also in M. 
This means that {m,w) G BP{M), which contradicts the stability of M. 

Next, suppose that both {m, Mi*{m)) and (Mi*{w),w) are in L. We have 
four cases to consider: (i) m G F™,w G -F™, (ii) m G Sfi,w G F™, (iii) m G 
F ™, w G Sfi and (iv) m G 5 ™, w € Sf*. 

Case (i): By the definition of F™ and F"', both m and w are single in M. 
But since (m, w) forms a blocking pair for Mi * , m and w write each other on 
their lists. This contradicts the stability of M. 

Case (ii): By the assumption that {m,w) is a blocking pair for Mi*, w 
Mi* (to). Since w G F“, w stays in to’s list when his list is modified to apply the 
Gale-Shapley algorithm. So during the execution of the Gale-Shapley algorithm 
at line 5, to proposed to w, but w rejected to, so Mi* (w) to. Then (to, w) 
cannot block Mi* , a contradiction. 

Case (iii): Similar to Case (ii). 

Case (iv): Since {m,w) is a blocking pair for Mi*, w Mi*{m) and 
TO Mi*{w). But by Lemma 5, Mi*{m) Mopt(jn) and Mi* (w) Mopt{w). 
Then, w Mopt{m) and to Mopt{w), which means that (to, w) is a blocking 
pair for Mopt, a contradiction. □ 

The proof of Lemma 4 is completed by the following lemma, which guarantees 
the size of \Mi* \ at line 22 of Fig. 3. 

Lemma 7. Mi* at line 22 o/ I ncrease satisfies \Mi* \ > \M\. 

Proof. First of all, it should be noted that Increase never fails on i * at lines 7 
and 8 by Lemma 5. Also, during the execution of the while- loop on i * , Increase 
never fails by Lemma 6 (2). By Lemma 6 (1), we know that \Mi* \ = log N . 

However, during the execution of the while-loop, some pairs may be removed 
from Mi* — L, which may decrease the size of Mi* . Note that all pairs in Mi* — L 
are pairs in M. In the following, we show that if a pair in Mi* — L is removed 
during the while-loop, then the pair must be a bad edge of M . If this is true, 
the number of removed pairs in the while-loop is at most 4c log N by Lemma 2, 
and thus \Mi* \ > \M\ + | logiV — 4clogfV > \M\. (Recall that c < 

Suppose that during the while-loop of Increase, some pair is removed from 
Mi* . Then, there is a blocking pair (to, w) for Mi* and both to and w are matched 
in Mi*. We have two cases: (1) (to, Mi*{m)) G L and {Mi*{w),w) G Mi* —L (and 
hence {Mi*{w),w) is removed). (2) {m, Mi*{m)) G Mj. — Land (Mi*{w),w) G L 
(and hence {m, Mi*{m)) is removed). We consider only Case (1). (Case (2) can 
be treated similarly.) Now, suppose that the removed pair {Mi* {w),w) is a good 
edge of M. We will show a contradiction. 

For Case (1), we further consider two cases: (1-1) to G F™ and (1-2) to G Sff. 

Case (1-1): Note that to is single in M since to G F*”. Now observe 
that, as (Mi*{w),w) G Mi* — L, w and Mi*{w) are matched in M, namely. 
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Mi»{w) = M{w). Since {m,w) G BP{Mi»), it results that {m,w) G BP{M), 
which contradicts the stability of M . (In this case, we can have a contradiction 
without assuming that {Mi*{w),w) is a good edge of M.) 

Case (1-2): Since we assume that {Mi*{w),w) is a good edge of M, 
M{w) Pyj Mopt{w) by Lemma 1. For the same reason as above, Mi* {w) = M{w). 
So, Mi*{w) Mopt{w). As {m,w) is a blocking pair for Mi*, it results that 
m Mi*{w) Pyj Mopt{w). Next, consider the man m which we assumed to be 
in S'™. By Lemma 5, Mi*{m) Pm Moptim). Again, as {m,w) is a blocking pair 
for Mi*, w Pm Mi*{m). So, w Pm Mi*{m) Pm Mopt(jn). Consequently, we have 
that {m,w) is in BP{Mopt), a contradiction. □ 

5 Procedure STABlLlZE(iVfo) 

Stabilize takes a matching Mg and makes it stable without decreasing the size. 
Recall that for any blocking pair (to, w) for Mg, at least one of to and w is single 
in Mg. For a matching M, define BPs^m{M) C BP{M) to be the set of all 
blocking pairs (to, w) for M such that to is single in M and w is matched in M . 
Similarly, BPm,s{M) {BPs^s{M) and BPm,m{M), respectively) denotes the set 
of all blocking pairs (to, w) for M such that to is matched and w is single (both 
TO and w are single, and both to and w are matched, respectively) in M . Define 
BP-^s{M) = BPm,s{M) U BPg^g{M). Fig. 4 shows the procedure Stabilize. 

Procedure Stabilize(Mo) 

1: while ( BPg^miMo) / 0 ) 

2: {select (m, ic) G BPs.m (Mo); 

3: w* ~ woman s.t. (m,w*) G BPa,m{Mo) and 

there is no {m,w') G BPs,m{Mo) s.t. w' >~m w*-, 

4: Mo := Mo — {(Mo{w*),w*)} U {{m,w*)}; 

5: } 

6: while ( BP-,g{Mo) / 0 ) 

7: (select (to, in) G BP-,s(Mo); 

8: TO* man s.t. G BP_,s(Mo) and 

there is no (to',w) G BP_,s(Mo) s.t. to' y-w to*; 

9: if ( TO* is matched in Mo ) 

10: Mo := Mo — {(to*, Mo(to*))} U {(to*, w)}; 

11: else 

12: Mo := Mo U {(to*, ui)}; 

13: } 

Fig. 4. Procedure Stabilize 
5.1 Correctness of Stabilize 

Lemma 8. Suppose that an application of line 4 of Stabilize updates Mg as 
follows. 

Mq := Mo - {{Mo{w*),w*)} U {(to*, to*)}. 




A (2 — c ) -Approximation Algorithm for the Stable Marriage Problem 359 



Then, following (1) through (3) hold. (1) Mq{w*) Mo{w*) and for any 
w{^ w*), M^{w) = Mo(w). (2) |M'| = |Mo|. (3) If = 0, then 

BPmMMi,) = 0 . 

Proof. (1) Since is in BP{Mq), m* Mq{w*). So, Mq{w*) 

Mo{w*) because Mq{w*) = m* . The latter part of (1) is trivial because, among 
all women, only w* changed a partner. 

(2) This is trivial. 

(3) Observe that three persons changed the partner by updating from Mq to 
Mq. w* obtained a better partner, m* became matched from single, and Mq(w*) 
became single from matched. So, any blocking pair arising by changing from Mq 
to Mg is associated with the man Mo{w*). Since Mq{w*) is single in Mg, any 
pair in BP{Mq) — BP{Mq) is not in BPm,mi,Mlf). 

Next, consider {m,w) G BP{MQ)r\BP{Mo). Since BPm,m{MQ) = 0, at least 
one of m and w is single in Mg. Recall that only m* changed the status from 
single to matched. So if m yf m* , {m,w) ^ BPm,m{MQ). 

Now consider a blocking pair {m*,w) G BP{Mq) 0 BP{Mq). If w was single 
in Mg, she is also single in Mg and hence {m* ,w) ^ BPm,m{Mo)- So assume 
that w was matched in Mg. In this case, both and (m* ,w) were in 

BPs,m{Mo)- So, both w* and w were candidates for being matched with m* 
in Mg. But since w* was selected, it must be the case that w* Pm* w. Hence 
{m*,w) cannot block Mg, leading to a contradiction. 

We have shown that any element in BP{Mq) — BP{Mq) and BP{Mf) fl 
BP{Mq) is not in BPm.miM'o). This completes the proof. □ 

Lemma 9. Suppose that an application of lines 10 and 12 o/ S tabilize updates 
Mg as follows. 

(Line 10) Mg := Mg — {(m*, Mg(m*))} U {(m*, w*)}. 

(Line 12) Mg := Mg U {(m*, w*)}. 

Then, following (1) through (3) hold. (1) In case of executing line 10, 
Mq^ui*) Pm* (in case of executing line 12, m* becomes matched in 

Mg), and for any m(yf m*), M({m) = Mo{m). (2) \M(\ > |Mg|. (3) If 
BPs,m{Mo) U BPm,m{Mo) = 0, then BPs,m{M(f) U BPm,m{M() = 0. 

Proof. The proof is similar to that of Lemma 8 and will be omitted. □ 



Lemma 10. Let M' he the output 0 / Stabilize. Then \M'\ > |Mg| and M' is 
stable. 

Proof. Consider an application of line 4 of Stabilize. By Lemma 8 (1), at least 
one woman gets better off and all other women do not change the marital status. 
Since there are N women, each with a preference list of length at most N, the 
number of repetitions of the first while-loop is at most N'^. Let M" be the 
matching just before Stabilize starts the second while-loop. Then BPs^m{M”) 
is empty. (This is the condition for Stabilize to exit from the first while-loop.) 
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Since, BPm,m{Mo) is empty, we can show that is empty by applying 

Lemma 8 (3) repeatedly. Combining these two facts, it results that BPs^m{M”)U 
BPm,m{M”) is empty. Also, by Lemma 8 (2), \M"\ = \Mo\. 

Similarly as above, each application of line 10 or 12 would make men better 
off (Lemma 9 (1)), and hence the number of repetitions of the second while- 
loop is at most N^. Since BPs^m{M") U BPm,m{M") = 0, we can show that, 
BPs^m{M') U BPm,m{M') = 0 using Lemma 9 (3) repeatedly. However, the 
termination condition of Stabilize says that BP-^s(M') = 0. Consequently, 
BP(M') is empty and hence M' is stable. By Lemma 9 (2), \M'\ > \M"\. So, 
\M'\ > \Mo\. □ 
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Abstract. Given a set of rectangles we are asked to pack as many of 
them as possible into a bigger rectangle. The rectangles packed may 
not overlap and may not be rotated. This problem is AP-hard in the 
strong sense even for packing squares into a square. We establish the 
relationship between the asymptotic worst-case ratio and the (absolnte) 
worst-case ratio for the problem. It is proved that there exists an asymp- 
totic FPTAS, and thns a PTAS, for packing squares into a rectangle. We 
give an approximation algorithm with asymptotic ratio of at most two 
for packing rectangles, and farther show a simple (2 -|- e)-approximation 
algorithm. 



1 Introduction 

In this paper we consider a rectangle packing problem to maximize the through- 
put. This problem is stated as follows. We are given a set of rectangles Ri = 
(ai,bi), i = 1, . . . , n, where ai < a and bt < b are the width and the length of 
Ri, respectively. The goal is to pack a subset of rectangles, without any overlap, 
into a bigger rectangle R = (a, b) (a < b) such that the total number of packed 
rectangles is maximized. Any rotation of rectangles is not allowed. This problem 
is denoted as URP. We also consider a special case, called USP, that for all Ri, 
Qi = bi- In this case we are asked to pack a maximum number of squares into a 
rectangle. 

Our general problem is packing as many rectangles into a rectangular bin 
as possible. It is a dual version of the two-dimensional bin packing problem, 
which requires to pack a set of rectangles without overlap into the minimal 
number of unit squares. For the latter, Bansal and Sviridenko [2] proved the 
APA-hardness in the asymptotic case for packing rectangles into square bins. 
Correa and Kenyon [4], Bansal and Sviridenko [2] showed that there exists an 
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asymptotic PTAS for packing squares into square bins. Our problem is also 
related to strip packing, in which rectangles are packed into a strip with a width 
one and an infinite length. The goal is to minimize the length of the packing 
into the strip. Kenyon and Remila [7] gave an asymptotic FPTAS. Schiermeyer 
[9] and Steinberg [10] showed an absolute worst-case ratio of two with different 
approaches. 

Known results. Leung et al. [8] proved that the problem of determining 
whether a set of squares can be packed into a bigger square or not is strongly 
A^P-complete. Therefore it is fVP-hard in the strong sense to pack squares into a 
square for maximizing the number of the squares packed. Hochbaum and Maass 
[5] presented a PTAS applying a shifting technique to a lot of geometric prob- 
lems. One of the problems is packing a maximum number of identical squares 
into a rectangular grid. Baker et al. [1] showed an asymptotic 4/3-approximation 
algorithm for packing a maximum number of squares into a rectangle (Problem 
USP). A more general problem was recently considered in [6], in which each rect- 
angle has an associated weight and the goal is to maximize the total weight of the 
rectangles packed. A number of approximation algorithms were given, the best of 
which has a worst-case ratio of at most 2+e. Note that the (2-|-e)-approximation 
algorithm is only of theoretical interest. It consists of several steps, such as guess- 
ing the gap structure, solving a large number of linear programs, etc., which are 
of extremely high time complexity. The case that the weights are equal to the 
rectangle areas was addressed by Caprara and Monaci [3]. They mainly focused 
on exact algorithms. A polynomial (3 -I- e)-approximation algorithm was also 
derived. 

Our contribution. Both packing squares and packing rectangles are consid- 
ered. We first prove that there is an asymptotic FPTAS (AFPTAS) for prob- 
lem USP. The approach is based on a greedy algorithm and the AFPTAS by 
Kenyon and Remila [7] for strip packing. Then a PTAS is derived, which is 
best possible since the problem is strongly A^P-hard. For problem URP we 
first give a simple algorithm of asymptotic ratio no larger than two, with a 
running time of 0(nlog^ n/loglogn). By adding an enumerating strategy a 
(2 -|- £)-approximation algorithm is obtained, whose running time is bounded 
by Note that the time complexity is much lower than that of the 

(2 -|- e)-approximation algorithm in [6], which is double exponential in 1/e. 

Organization of the paper. The remainder of the paper is organized as fol- 
lows. Section 2 presents preliminaries. In Section 3, an AFPTAS and a PTAS are 
given for packing squares. In Section 4, the problem URP for packing rectangles 
is considered. Section 5 gives some concluding remarks. 

2 Preliminaries 

Our problem is asking for a subset of items which form a feasible packing (can 
be packed into a rectangle (a, b)) and have the largest cardinality. 
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Theorem 2.1. [10] Given a set of rectangles, if the length (or the width) of any 
rectangle is at most 6/2 (or at most a/2), and if their total area is at most ah/2, 
then the rectangles can he packed into {a, h) . 

The running time of Steinberg’s algorithm is 0{n log^ n/ log log n). Through- 
out the paper we employ his algorithm to pack a set of items when the conditions 
of Theorem 2.1 are satisfied. 

Let Si = Oi ■ hi be the area of rectangle Ri = (ai, hi) for i = 1, . . . ,n, and 
C = a-hhe the area of the bigger rectangle R = (a, b). Without loss of generality 
we assume that Si < Si+i, for i = l,...,n — 1. For convenience, we also call 
rectangles (squares for USP) items and the bigger rectangle (a, 6) a bin. We 
distinguish items as long, wide, big and small items. An item Ri is called long if 
hi > 6/2 and Oi < a/2 , while Ri is wide if ai > a/2 and hi < 6/2. An item Ri is 
called big if bi > 6/2 and ai > a/2, whereas Ri is small if 6^ < 6/2 and < a/2. 

To evaluate an approximation algorithm, we use the standard measures. The 
Worst-Case Ratio is defined for maximization problems as 

Ra = sup Nopt{I)/NA{I), 

I 

where Nopt{I) and Na{I) are the optimal value and the objective value given by 
an approximation algorithm A for any instance I, respectively. An algorithm A 
is called a p-approximation algorithm if Nopt{I) / N a{I) < p for any instance I. 
The Asymptotic Worst-Case Ratio is defined as 

= lim s,up{Nopt{I) / N A{I)\Nopt{I) > K}. 

K—¥oo j 

We say algorithm A is an asymptotic p-approximation algorithm if there exists a 
constant (3 such that Nopt{I) < pNa{I) + (3 holds for any instance I. Throughout 
the paper, we may use N^pt instead of Nopt{I) and Na instead of Na{I) if no 
confusion is caused in the context. 

Let A be an asymptotic p-approximation algorithm. For any given e > 0, 
there exists a constant integer N > (3/ s such that A^opt < (p + s)Na if N^pt > 
N. We can construct an improved approximation algorithm, denoted by A, as 
follows. 

1. Try all possibilities to pack up to N items. Denote by Ni the maximum 
number of items packed among all feasible packings. If Ni < N, output a 
packing with Ni items and stop. 

2. Apply algorithm A. 

Since is a constant, it takes a constant time to check if up to N items can be 
packed into a bin. In the first step of A, we need to consider no more than 
subsets of items. Therefore, the running time of the first step is It is 

obvious that algorithm A outputs either an optimal packing (if it stops at the 
first step) or a packing with at least Nopt/{p + e) items. The following lemma 
holds. 
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Lemma 2.2. If there is an asymptotic p-approximation algorithm for problem 
URP, there exists a {p + e)- approximation algorithm for any e > 0. □ 



Corollary 2.3. If there is an asymptotic FPTAS, there exists a PTAS for Prob- 
lem USP. □ 



3 Packing Squares 



Recall that the algorithm NFI proposed by Baker et al. [1] has an asymptotic 
worst-case ratio of at most 4/3. By Lemma 2.2 there exists an algorithm with 
worst-case ratio of 4/3 -I- e. In this section, we improve their results by providing 
a PTAS. For the sake of convenience, we denote a square (ai,ai) as Oi if no 
confusion is caused in the context. 

Recall the problem: Given a set of items with side length of Oi < a, we are 
requested to pack as many squares into a bin (a, b) as possible. In the following, 
we present an approximation algorithm, denoted by U{e), which depends on 
£ G (0, 1), the error bound. The algorithm is described with two cases. For any 
given small number 1 > £ > 0, let M > l/£^ be a sufficiently large constant. 

Algorithm U{e). 

Case 1. a > b/M. Let £q = e/{M 2)^ and N = |"(1 -I- £o)/£q]. Recall that 
C = a- b. 



1 . 

2 . 

3. 



k fc+1 

Assume that oi < 02 < . . . < a„. Find k such that ^ af < C and ^ of > 

i—1 z— 1 

C. We only consider Note that Nopt < k. 

If k < N, run all possibilities to pack a subset of {oi, . . . , Ok} into (a, b) and 
we get an optimal packing. 

If fc > A, assign m by shelf packing as follows. 

— Determine k\ that a — ati+i < ai -h ■ ■ ■ -h Ofcj < a. These ki squares are 
packed into a shelf with a length Ofcj. Let p = 1. 

— Determine kp+i such that a — < Ofep+i -I- • • • -I- Ofep+i < a. 



p-hl 

If E Ofci < b, 

i=l 



pack Ofcp+i, . . . , Okp+i into a shelf with length p=p-l-l, repeat 

this step (to find the next fcp+i), 



p+i 

reset kp+i to be the largest possible index such that ^ < b, and 

i—1 

pack Ofep+i, . . . , into a shelf with length and stop. 



An illustration is shown in Figure 1. 

Case 2. a < b/M. In this case we apply the AFPTAS of Kenyon and Remila [7] 
to our problem. Since the side length of any item (square) is bounded by a we 
can pack at least M > 1/e^ items. 
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1. Determine k as in Case 1. 

2. If the total area of the k items is bounded by C/2, pack all of them into a 
bin (a, 6) by Steinberg’s algorithm (see Theorem 2.1). 

3. If the total area of the k items is over C/2, consider a strip packing instance: 
apply a scaling strategy (dividing all widths of the items by a) and pack 
the k scaled items of width at most one into a strip with width one and 
an unlimited length. Let d < e/12. Run the AFPTAS [7] on the k items. 
Let H be the length of the resulting packing, li H < h, we are done. If 
b < H < {1 + e/6)6, go to the next step; otherwise, remove the item with 
the largest area and set fc = fc — 1, repeat Step 3. 

4. Cut the strip (length bounded by (1 + e/6)6) into small pieces, each with a 
length of £6/6 (the last piece may have a smaller length). Index the pieces 
from the left to the right. Find the piece among those indexed as 3/ — 1, 
j > 1, which contains the fewest items (including those partially packed in 
the piece). Remove all the items from this piece (including those partially 
packed in the piece). Figure 2 gives an illustration on cutting the strip into 
pieces. 




Qq 

unpacked 




Fig. 2. Cutting the strip into pieces 
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Theorem 3.1. Algorithm U{e) is an AFPTAS for USP. 

Proof. As in the description of the algorithm we analyze U{e) with two cases 
separately. We also use the same notations as above. 

Case 1. a > b/M. Let Uq be the smallest unpacked square (see Figure 1) and let 
Nj. be the number of the squares unpacked by algorithm U (e) . We first estimate 
the value of Nr and respectively. Consider the shelves in packing the items 

by U{e). Assume that the number of shelves generated is / + 1. Let hi be the 
total side length of the squares in z-th shelf. Clearly, a — hi < aq, for z = 1, ... ,f. 
The length of the z-th shelf is and the smallest square in the (z + l)-st shelf 
is ttki+i for z = 1, 2, . . . , /. We estimate the total area AR of items packed into 
the bin. Note that AR > aihi + aki+ih 2 + • • • + akf+ihf+i. If a — hf+i > Uq, 
then ttki + + ■ ■ ■ + cikf > b — Uq. We have Ofej + 0^2 + • • • + > b — 2oq 

and thus 

AR > afei/i2 + • ■ • + akjhfjri 

> {a- aq){aki H h 

> {b - 2 aq){a - Qq) 

> ab — { 2 a + b)aq. 

If a — /z/+i < aq, then Ofej + 0^2 + • ■ • + Ofcj + > b — aq. We have Ofej + 

Ofc 2 + • • • + oLkf > b — 2aq and thus 



AR > + • • • + akfhf+i 

> (a - aq)(aki H hakj.) 

> (b- 2aq)(a - aq) 

> ab — (2a + b)aq. 



According to the algorithm the total area of the k items is at most C = ah. 
Moreover, the unpacked items have size of at least aq. Thus, Nr < (ab—AR)/afq < 

(2a + b)/aq. 

On the other hand, all packed items have size of at most aq. Therefore 
^u(e) ^ — (o/a? ~ l)(^/®g — 1) > (a/cLq — 1)^. Note that 

Nopt <k = Nr + If aq > (2a + b)eo, Nr < l/^o- In this case, we have 

Nopt < ^U(e) + Nr < Nij(^g) + I/£o- 

If cLq < (2a + b)eo, 



Nopt 

Nu{e) 



< 1 + 



Nr 

Nu(e) 



< 1 + 



2(z “h b 

(a-a,)2“« 



< 1 + 
< 1 + 



(2a + 5)^ ^ (2 + 6/a)^ 

{a-{2a + b)eo)^^° - ^ (I - (2 + b/a)so)^^° 
(M + 2)2 

(1_(M + 2)£o))2^°- 



Case 2. a < b/M. In this case we apply the AFPTAS of Kenyon and Remila [7] 
to our problem. Noting that the side length of any item (square) is bounded by 
a we can pack at least M > items. 
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If the total area of the k items is bounded by C/2, we are able to pack all of 
them into a bin (a, b) by Steinberg’s algorithm, and thus get an optimal packing. 
Assume that the total area of the k items is over C/2. Then the optimal length 
H* is larger than 6/2 for the strip packing problem (pack the k items into a bin 
(strip) with width of a and an unlimited length to minimize the length of the strip 
used). Run the AFPTAS [7] on the k items. Since b > M/2 (after the scaling), we 
have H* > M/2 > l/{2e^). Therefore, H < {I + 5)H* + o\l/ 5^) < {l + e/C)H* 
when e > 0 is small enough (we are interested in an AFPTAS of a bound 1 + £, 
and can always assume that £ is a sufficiently small positive number), where 
5 < e/12. If < 6, all items are packed into a bin (a, 6). If > (I + £/6)6, then 
JI* > b. It shows that there does not exist a feasible packing for the k items. 
Recall that these k items have the smallest areas from the selection procedure. 
Hence Ngpt < k. Remove the item with the largest area and repeat the process 
until H < {1 + e/6)b. Let k < k he the number of items obtained after the third 
step of the algorithm is completed. Then Nopt < k. 

If 6 < < (1 + e/6)b, the strip (length bounded by (1 + £/6)6) is cut into 

small pieces of length £6/6. The number P of pieces is at least [6/£j. Index the 
pieces from the left to the right. Let Aj be the set of items involved (completely 
or partially) in piece 3j — 1, j = 1,2,..., [P/3J . Remove the set with the smallest 
number of items among all A^’s. Note that any item can be (partially) involved 
in at most two pieces. It means that AiD Aj = % \i i ^ j. Thus the number of 
items removed is at most /c/[P/3j < ke (when e < 1/2). The remaining items 
can be packed into a bin (a, 6) by shifting them. 

Finally we get a feasible packing with at least (1 — s)k items packed. In 
Case 1 the algorithm runs in 0(n log n) time and in Case 2 the running time 
is dominated by employing up to n times the AFPTAS of Kenyon and Remila. 
Therefore, algorithm U(s) is an AFPTAS for problem USP. □ 

Corollary 3.2. There exists a PTAS for USP. 

Proof. It follows directly from Corollary 2.3. To construct the PTAS we only 
need to do enumeration for the first case. □ 



4 Packing Rectangles 

In this section, we show an approximation algorithm, denoted by P^, which 
consists of three phases. Before presenting the algorithm, we simply give some 
necessary conditions that any feasible packing fulfills: 

— There is at most one big item. 

— The total length of the wide items and the big item (if it exists) is at most 6. 

— The total width of the long items and the big item (if it exists)is at most a. 

— The total area of the items is bounded above by C. 

Algorithm P^ (Pre-selecting, Partitioning and Packing). 
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Pre-selecting items. 

1. Resort the items in non-decreasing order of their areas. 

2. Select items from the head of the ordered list such that the total area of 
the items selected is at most C. Then consider the selected ones. If the total 
width of the long items and the big item (if it exists) is at most a and the 
total length of the wide items and the big item (if it exists) is at most b, set 
state = 1. Otherwise, set state = 0. 

Partitioning items. 

1. If state = 0, we put the long items and the big item (if it exists) into a group 
Gi and put the wide items and the small items into another group G 2 . Since 
the total width of the (long or big) items in Gi is larger than a, their total 
area is more than C/2. Then the total area of the items in G 2 is less than 
C/2. Let RL be the remaining list of items which are not considered yet. 
Move all long items from RL to Gi . Add as many items as possible from RL 
(in which the items are still in the non-decreasing order of area) to G 2 as 
long as the total area of the items in G 2 is bounded above by C/2. Finally, 
sort the items in Gi in non-increasing order of width. Remove the items one 
by one from the head of the list until the total width of the remaining items 
in Gi is at most a. 

2. If state = I, partition these items into two groups: put the long items into 
Gi and put the wide items into G 2 . The small items go to Gi or G 2 in an 
arbitrary way provided that the total area of the items in Gi and the total 
area of the items in G 2 are at most C/2. If there is a big item, discard it. 
There is at most one item unpacked (including the big item if it exists). 

Packing items. Note that after the second phase, both groups of items admit a 
feasible packing. G 2 consists of wide items and small items, and the total area 
of the items in G 2 is at most C/2. By Theorem 2.1, the items in group G 2 can 
be packed into a bin (a, 6). If state = 0, Gi contains only long items and at 
most one big item and the total width of them is at most a. If .state = 1, Gi 
consists of long items and small items and the total area of the items in Gi is 
bounded above by C/2. In both cases there is a feasible packing. Choose the 
packing between Gi and G 2 with more items packed. 

Theorem 4.1. For any set I of items the following equation holds 

No-pt{I)<2Np,{I) + l. ( 1 ) 

Proof. Let OPT be an optimal packing and let X be the number of the accepted 
items in the first phase of algorithm P^. Note that the accepted items are smallest 
in area and the total area will be over G if any other item is added. It implies 
that Nopt < X. We deal with two cases as follows. 

Case 1. When the algorithm P^ ends. State = 1. Consider the second step in 
Phase 2 (partitioning items). If there is a big item, it is discarded. All the small 
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items can be put into one of the groups. If some small item, say Ri, can be 
put into neither of the groups, then the total area of items of Gi if Ri is added 
will be more than C/2, for z = 1, 2. Note that the area of the big item is larger 
than Si, the area of Ri. It implies that the total area of items (including Ri 
and the big item) is larger than C . It gives a contradiction since the total area 
of items after the pre-selection is bounded above by C. Therefore, only the big 
item is unpacked. If there are no big items at all, we can analogously show that 
at most one small item is unpacked. Then we have |Gi| -I- IG 2 I > ^ — 1, i.e., 
2Np3{I) > X — 1 while N^pt < X. It shows that Equation (I) holds. 

Case 2. When the algorithm ends. State = 0. After the first phase, the total 
width of the long items and the big item (if it exists) is larger than a (analogously 
for the case that the total length of wide items is larger than b). In this case 
the total length of wide items must be smaller than b. Otherwise, the total area 
of the items is larger than G. Let G'^ be the minimum set of the smallest long 
items in area, the total width of which is at least a (and then the total area of 
the long items is larger than C/2). It means that the total width will go down 
to below a if the largest one in area is removed. It is clear that \G'i\ < |Gi| -I- 1 
since G\ contains the long items with smallest widths. 

Now we analyze the optimal packing OPT. Assume that OPT contains I 
long items and s other items (wide or small items). Moreover, let OPTi be the 
set of the I long items and the big item (if it exists) and OPT 2 be the set of 
the s other items. If s > IG 2 I, the total area of the items of OPT 2 must be 
larger than C/2. Then the total area of the items of OPTi is less than C/2. 
Recall that G^ contains the smallest long items in area and the total area of 
its items is larger than C/2. Compare G'l to OPTi. It shows that some (long) 
item of G'l, say Rj, is missing in OPTi. Replace the largest item in area of 
OPT 2 with the missing item Rj and put it into OPTi. We get a new set of 
items OPT' other than OPT (it might be infeasible) where the number of long 
items increases by one and the number of other items decreases by one. It implies 
that \OPT'\ = Nopt. Continue this replacing strategy until the total area of other 
(small or wide) items is at most C/2. Note that the total number of items remains 
unchanged. Finally, we can show that s < IG 2 I and I < \G'i\. It follows that 
Nopt < |G'i| + |G 2 | < |Gi| + |G 2 | + 1.0ntheotherhand,Np3(7) > (|Gi| + |G2|)/2. 
We conclude that Equation (1) holds. □ 

The running time of algorithm P^ is 0(nlog^ n/loglogn). It takes time of 
O(rzlogn) in Phase I and Phase 2, and time of 0(n log^ n/ log log n) in Phase 3. 

By Lemma 2.2 we immediately obtain the following result. 

Corollary 4.2. There exists a {2 + s)- approximation algorithm for Problem 
URP. □ 

In Equation (I) the additive constant is one. Thus the running time of the 
(2 -I- e)-approximation algorithm is 
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5 Concluding Remarks 

In this paper we dealt with packing maximum number of items into a rectangular 
bin. We have proved that there is a PTAS for packing squares, which improves 
the previous upper bound of 4/3 + £. It is the best approximation result since the 
problem is A^P-hard in the strong sense. For packing rectangles we designed a 
simple (2 + £)-approximation algorithm. Although there already exists a (2 + £)- 
approximation algorithm for the more general weighted case [6], our algorithm 
is much simpler and of much lower time complexity. A remaining question is to 
settle whether URP admits a PTAS. 
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Abstract. In this paper we consider the linear time algorithm of Kasai 
et al. [6] for the computation of the Longest Common Prehx (LCP) array 
given the text and the suffix array. We show that this algorithm can be 
implemented without any auxiliary array in addition to the ones required 
for the input (the text and the suffix array) and the output (the LCP 
array). Thus, for a text of length n, we reduce the space occupancy of 
this algorithm from 13n bytes to 9n bytes. 

We also consider the problem of computing the LCP array by “overwrit- 
ing” the suffix array. For this problem we propose an algorithm whose 
space occupancy can be bounded in terms of the empirical entropy of 
the input text. Experiments show that for linguistic texts our algorithm 
uses roughly 7n bytes. Our algorithm makes use of the Burrows- Wheeler 
Transform even if it does not represent any data in compressed form. 
To our knowledge this is the first application of the Burrows- Wheeler 
Transform outside the domain of data compression. 

The source code for the algorithms described in this paper has been in- 
cluded in the lightweight suffix sorting package [13] which is freely avail- 
able under the GNU GPL. 



1 Introduction 

The suffix array [11] is a simple and elegant data structure used for several fun- 
damental string matching problems involving both linguistic texts and biological 
data. The vitality of this data structure is proven by the large number of suffix 
array construction algorithms developed in the last two years [1,5,7,8,14]. The 
suffix array of a text t[l,n] is the lexicographically sorted list of all its suffixes. 
The suffix array is often used together with the Longest Common Prefix array, 
LCP array from now on, which contains the length of the longest common prefix 
between every pair of lexicographically consecutive suffixes. The LCP informa- 
tion can be used to speed up suffix array algorithms and to simulate the more 
powerful, but more resource consuming, suffix tree data structure [6,11]. 

In [6] Kasai et al. describe a simple (13 lines of C code) and elegant linear 
time algorithm for computing the LCP array given the text and the suffix array. 

* Partially supported by the Italian MIUR projects “Algorithmics for Internet and the 
Web” and “Technologies and Services for Enhanced Content Delivery”. 



T. Hagerup and J. Katajainen (Eds.): SWAT 2004, LNCS 3111, pp. 372—383, 2004. 
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This was an important result for several reasons. First, although many suffix 
array construction algorithms can be modified to return the LCP array as well, 
this is not true for every algorithm. Having decoupled the two problems allows 
one to choose the suffix array construction algorithm which better suits his/her 
needs without the constraint of considering only those algorithms which also 
provide the LCP array. Moreover, in some applications one may need the LCP 
array later than the suffix array: if one has to compute them simultaneously 
some temporary storage must be used for the LCP array. 

The only drawback of the algorithm of Kasai et al. is its large space occu- 
pancy. Assuming a “real world” model in which each text symbol takes one byte 
and each suffix array or LCP array entry takes 4 bytes, the algorithm of Kasai et 
al. uses 13n bytes, where n is the length of the input text. Considering that the 
output of the computation (text, suffix array, and LCP array) takes 9n bytes, we 
have a An bytes overhead which is a serious issue since it is nowadays common 
to work with files hundreds of megabytes long. 

In this paper we present a modified version of the algorithm of Kasai et 
al. which only uses 9n bytes of storage. Our algorithm, called Lcp9, runs in 
linear time and has the same simplicity and elegance of the original algorithm. 
Experiments with several files of different size and structure show that Lcp9 is 
only 5%-10% slower than the algorithm of Kasai et al.^ 

In our “real world” model, a space occupancy of 9n bytes is optimal if we 
assume that at the end of the computation we need the text, the suffix array, and 
the LCP array. However, this is no longer true if one is interested only in the LCP 
array, that is, if at the end of the computation we no longer need the suffix array. 
In this case, the space initially used for storing the suffix array can be reused 
during the computation and for the storage of the LCP array. In this scenario 
we can aim to a space occupancy as low as 5n bytes. The problem of computing 
the LCP array discarding the suffix array has applications in the fields of string 
matching, data compression and text analysis. For example, using the algorithm 
described in [6, Sect. 5] with a single pass over the LCP array we can simulate a 
post order visit of the suffix tree of the text t. In some applications, for example 
for the construction of the compression booster described in [3], such visit does 
not need the information stored in the suffix array. 

If we only need the LCP array, even the 9n bytes space occupancy of algo- 
rithm Lcp9 becomes the space bottleneck of the whole computation since for the 
construction of the suffix array there are “lightweight” algorithms [1,14] which 
only use (5-1- e)n bytes with e <C 1. In this paper we address this issue proposing 
a simple linear time algorithm, called Lcp6, which computes the LCP array by 
“overwriting” the suffix array. The space used by Lcp6 depends on the regu- 
larity of the input text t[l,n] and can be bounded in terms of the fc-th order 
empirical entropy of t. If t is highly compressible the space occupancy of Lcp6 
can be as small as 6n bytes. Vice versa, if t is a “random” string the space 

^ Recently, we found out that in [9] Makinen describes a space-economical version of 
Kasai et al. algorithm which only uses 9.125n bytes. We will report on Makinen’s 
algorithm in the full paper. 
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required by our algorithm can be as large as lOn bytes. Note however that in 
the first step of Lcp6 we can evaluate exactly how much space it will need: if 
such space turns out to be larger than 9n bytes we can quit Lcp6 and compute 
the LCP array using Lcp9. Thus, combining Lcp6 and Lcp9 we get an algorithm 
with a space occupancy between 6n and 9n bytes. The experimental results show 
that for linguistic texts, source code, and xml/html documents Lcp6 always uses 
less than 8n bytes and for the largest files it often uses less than 7n bytes. For 
DNA sequences Lcp6 uses between 8n and 9n bytes, and — not surprisingly — for 
compressed files it uses close to lOn bytes. 

We point out that our algorithms only have a “practical” interest: from the 
theoretical point of view their working space of 6>(nlogn) bits is not optimal. 
Indeed, the optimal space/time tradeoff can be obtained combining the results 
in [4] and [15] which allow one to build the suffix array and LCP array in linear 
time using 0{n) bits of auxiliary storage. Unfortunately the algorithms in [4,15] 
are quite complex and it is still unclear whether they will lead to competitive 
practical algorithms. 

2 Background and Notation 

Let S denote a finite ordered alphabet. Without loss of generality, in the follow- 
ing we assume that E consists of the integers 1, 2, . . . , lU]. Let t[l, n] denote a 
text over E. For i = 1, ... ,n we write t[f, n] to denote the suffix of t of length 
n — i + 1 that is t[i, n] = t[t]t[i -|- 1] • • • t[n] . 

The suffix array [11] for t is the array Sa[1, n] such that t[SA[l], n], t[SA[2], n], 
. . . , t[SA[n], n] is the list of suffixes of t sorted in lexicographic order. To define 
unambiguously the lexicographic order of the suffixes it is customary to logically 
append at the end of t a special end-of-string symbol ^ which is smaller than any 
symbol in E. For example, for t = baaba, Sa = [5, 2,3,4, 1] since t[5,5] = a is 
the suffix with the lowest lexicographic rank, followed by t[2, 5] = aaba, followed 
by t[3, 5] = aba and so on. 

The rank array RANK[l,n] of t is the inverse of the suffix array. That is, 
Rank[z] = j if and only if Sa[j] = i. Note that Rank[i] is the rank of the 
suffix t[t,n] in the lexicographic order of the suffixes. The LCP array Lcp[l,n] 
of t is an array such that Lcp[t] contains the length of the longest common 
prefix between the suffix t[SA[t], n] and its predecessor in the lexicographic order 
(which is t[SA[i — l],n]). Note that Lcp[l] is undefined since t[SA[l],n] is the 
lexicographically smallest suffix and therefore it has no predecessor. 

Finally, we define the RankNext map such that: 

RANKNEXT(i) = Rank[Sa[i] -1-1], for i = 1, . . . ,n, i yf RANK[n]. (1) 

RankNext(i) is the rank of the suffix t[SA[i] -|- l,n], that is, the rank of the 
suffix obtained removing the first character from the suffix of rank i. Note that 
RankNext(-) is not defined for i = RANK[n] because in this case t[SA[i] -|- l,n] 
is the empty string. 
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Fig. 1. The Burrows- Wheeler Transform for t = mississippi. The output of the trans- 
form is the last column of the sorted matrix M, i.e., bwt = ipssm^pissii. 



2.1 The Burrows- Wheeler Transform 

In 1994, Burrows and Wheeler [2] introduced a transform that turns out to be 
very elegant in itself and extremely useful for data compression. Given a string 
t, the transform consists of three basic steps (see Fig. 1): (1) append to the 
end of t a special symbol ^ smaller than any other symbol in S] (2) form a 
conceptual matrix M whose rows are the cyclic shifts of the string t^, sorted 
in lexicographic order; (3) construct the transformed text bwt by taking the last 
column of A4. Notice that every column of A4, hence also the transformed text 
bwt, is a permutation of t^. 

If the input string t has length n, the transformed string bwt has length n -I- 1 
because of the presence of the # symbol. In the following we assume that the 
transformed string is stored in an array indexed from 0 to n. For example, in 
Fig. 1 we have bwt[0] = i, bwt[5] = bwt[ll] = i. Using this notation and 
observing that the rows of the matrix Ai — up to symbol ^ on each row — are 
precisely the suffixes of t in lexicographic order , the computation of bwt given t 
and the suffix array can be easily accomplished with the code of Fig. 2 (procedure 
Sa2Bwt). From bwt we can always recover t. The inverse transform is based on 
the following remarkable property. Let F[0, n] and L[0, n] denote respectively the 
first and last column of the matrix M (hence, L = bwt). Then, for any a G E 
we have that the fc-th occurrence of cr in U corresponds to the k-th occurrence 
of a in L. For example, in Fig. 1 we have that the second i in U (that is, U[2]) 
corresponds to the second i in L (that is, L[7]) since they both are the eighth 
symbol of mississippi. Similarly, the third s in U (^[10]) corresponds to the 
third s in L (L[8]) since they both are the sixth symbol of mississippi. 

Assume now that the character F[j] corresponds to L[i]. This means that 
row i of Ai consists of a (rightward) cyclic shift of row j. Because of the rela- 
tionship between rows of Ai and suffixes of t this is equivalent to stating that 
the i-th suffix in the lexicographic order is equal to the j-th suffix with the 
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first symbol removed. In terms of the map RankNext defined by (1) we have 
RankNext(j) = i. From this latter relationship it follows that from bwt we 
can obtain the RankNext map. Indeed, we only need to scan the array bwt 
(which coincides with column L) finding, for i = l,...,n the character F[j] 
corresponding to bwt[i] = L[i], The resulting code is given in Fig. 2 (procedure 
Bwt2RankNext). Note that column F is not represented explicitly (since it would 
take 0{n) space). Instead we use the array count[I, |i7|]: at the beginning of the 
z-th iteration count [fc] contains the number of occurrences in t of the characters 
1, 2, . . . , fc— I plus the number of occurrences of character k in bwt[0] • • • bwt[z— 1]. 

Given the RankNext map and the array bwt, we can recover t as follows. 
The position of the end-of-string symbol # in bwt gives us Rank(1), that is, the 
position of t[l,n] in the suffix array. By (1), setting i = Rank(j') we get 

RANKNEXT(RANK(j)) = Rank(Sa[Rank(j)] + 1) = Rank(j + 1). (2) 

Hence, given RankNext and Rank(1) we can generate the sequence of val- 
ues Rank(2), Rank( 3), . . . , RANK(n) using the recurrence (2). From the se- 
quence Rank(1), Rank( 2), . . . , RANK(n) we recover t using the relationship 
t[z] = bwt[RANK(z -I- 1)]. The corresponding code is shown in Fig. 2 (procedure 
RankNext2Text). 

We conclude this section observing that from the sequence Rank(1),..., 
RANK(n) we can also recover the suffix array since k = Rank(z) implies Sa[/c] = 
z. The corresponding code is shown in Fig. 2 (procedure RankNext2SufFixArray). 
Note that in RankNext2SufFixArray as soon as we have read rankjnext [k] in 
Line 3 that entry is no longer needed. Therefore, if we replace Line 4 with the 
instruction rank_next [k] = i++ ; we get a procedure which stores the suffix 
array entries in the array rauikmext overwriting the old content of the array 
(the RankNext map). This property will be used in Section 4. 

2.2 The Algorithm of Kasai et al. 

The algorithm of Kasai et al. (algorithm Lcpl3 from now on) takes as input 
the text t[l,rz] and the corresponding suffix array Sa[1,zz] and returns the LCP 
array. For z = l,...,n let denote the LCP between t[z,n] and the suffix 
immediately preceding it in the lexicographic order (£i is undefined when t[z,n] 
is the lexicographically smallest suffix). The algorithm Lcpl3 computes the LCP 
values in the order ii,£ 2 , ■ ■ ■ ,in- 

The code of Lcpl3 is shown in Fig. 3. As a first step (Line 1) the algorithm 
computes the rank array RANK[l,rz]. Then, at the z-th iteration of the main 
loop (Lines 3-13) Lcpl3 computes £i as follows. At Line 4 the value Rank[z] is 
stored in the variable k. If Rank[z] = 1 then t[z,n] is the smallest suffix in the 
lexicographic order and £i is undefined (we set it to —1 at Line 5) . If Rank[z] > 1, 
we compute j = Sa[Rank[z] — 1] (Line 7). t[j,n] is the suffix preceding t[z,n] 
in the lexicographic order, hence £i is the longest common prefix between t[z,n] 
and t[j, n]. 

The crucial observation, which ensures that Lcpl3 runs in 0{n) time, is that 
whenever £i and £i-i are both defined we have £i > £i-\ — 1 (Theorem 1 in [6]). 
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Procedure Sa2Bwt 


Procedure Bwt2RankNext 


1 . bwt [0] =t [n] ; 


1. 


for(i=0; i<=n; i++) { 


2. f or (i=l ; i<=n; i++) •[ 


2. 


c = bwt [i] ; 


3. if(sa[i] == 1) 


3. 


if(c == ’#’) 


4. bwt [1] = ’#’ ; 


4. 


eos_pos = i; 


5. else 


5. 


else { 


6. bwt [i] =t [sa[i] -1] ; 


6. 


j = ++count [c] 


7. } 


7. 


rank next[j]=i 




8. 


} 




9. 


d 




10. 


return eos_pos; 



Procedure RankNext2Text 


Procedure RankNext2SuffixArray 


1. k = eos_pos; i=l; 


1. 


k = eos_pos; i=l; 


2. do { 


2. 


while (k!=0) { 


3 . k = rank_next [k] ; 


3. 


nextk = rank_next [k] ; 


4 . t [i++] = bwt [k] ; 


4. 


sa[k] = i++; 


5 . } while (k! =0) ; 


5. 


k = nextk; 




6. 


d 



Fig. 2. Algorithms related to the Burrows- Wheeler Transform. Procedure Sa2Bwt com- 
putes the array bwt given the text t and the suffix array sa. Procedure Bwt2RankNext 
stores in rankmext the RankNext map and returns the value Rank(1). The pro- 
cedure uses the auxiliary array count[l, |X'|] which initially contains in count [i] the 
number of occurrences in bwt (and therefore in t) of the characters 1, ... — 1. Pro- 

cedure RankNext2Text recovers the text t given the arrays bwt and rankmext and the 
value Rank(1) stored in eos_pos. Procedure RankNext2SuffixArray computes the suffix 
array given rankjiext and the value Rank(1) stored in eos_pos. 



Procedure Lcpl3 

1. f or (i=l ; i<=n; i++) rank[sa[i]] = i; 

2. h=0; 

3. f or (i=l ; i<=n; i++) { 

4 . k = rank [i] ; 

5. if(k==l) lcp[k]=-l; 

6. else { 

7. j = sa[k-l] ; 

8. while(i+h<=n && j+h<=n && t [i+h] ==t [j+h] ) : 

9. h++;: 

10. lcp[k] = h; 

11 . > 

12. if(h>0) h— ; 

13. > 



Fig. 3. Algorithm of Kasai et al. for the linear time computation of the LCP array. 
The algorithm takes as input the text t and the suffix array sa and stores in Icp the 
LCP array. The algorithm uses an auxiliary array rank to store the rank array (which 
is the inverse of the suffix array). 
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To use this property Lcpl3 maintains the invariant that at the beginning of the 
z-th iteration the variable h contains the value li-\ — 1. Hence, ii is computed 
comparing t[z,rz] and t[j, n] starting from position h (Lines 8-9). Note that at 
Line 10 Lcpl3 stores ii in Lcp[Rank[z]] since the definition of LCP array states 
that Lcp[t] contains the LCP between t[SA[t],n] and t[SA[t — l],n]. 

In our “real world” model, algorithm Lcpl3 requires n bytes for the array t 
and 4n bytes for each one of the arrays Sa, Rank, and LcP. Therefore its peak 
space occupancy is 13n bytes. 

3 LCP Computation in 9n Bytes of Storage 

In this section we show how to modify the algorithm of Kasai et al. for computing 
the LCP array in linear time without using any auxiliary array. As a result we 
get an algorithm which only uses 9n bytes of storage. Our approach consists in 
using the array Icp for storing both “rank information” and “LCP information”. 
Initially the array contains only “rank information”. Then, at each iteration of 
the main loop one item of rank information is used and replaced by one item 
of LCP information. At the end of the computation the array Icp only contains 
LCP information. 

Our starting point is the observation that algorithm Lcpl3 (Fig. 3) uses the 
rank information only in Line 4 where, during the z-th iteration of the main loop, 
the algorithm retrieves the value Rank(z). Therefore, Lcpl3 uses the sequence 
of rank values Rank(1), Rank( 2), . . . , RANK(n) exactly in this order. Moreover, 
after the z-th iteration of the main loop the value Rank(z) is no longer needed. 

In Section 2.1 we have shown that using the recurrence (2) we can gener- 
ate the sequence Rank(I), Rank(2), . . . , Rank(zz) given the RankNext map 
and the value Rank(1). The above observations suggest the algorithm Lcp9 
whose code is shown in Fig. 4. In the first step of Lcp9 (Line I) we call the 
procedure Sa2RankNext which, for j = 1, . . . ,zz, stores the value RankNext(j) 
in Lcp[j], and returns the value Rank(1). Then, in the z-th iteration of the 
main loop (Lines 3-15) given Rank(z) we retrieve Rank(z -|- 1) from entry 
Lcp[Rank(z)]. Note that as soon as we have retrieved Rank(z -|- 1) we can 
use the entry Lcp[Rank(z)] for storing the LCP relative to t[z,rz]. 

Summing up, the main loop of algorithm Lcp9 (Lines 3-15) works as follows. 
At the beginning of the z-th iteration the variable k contains the value Rank(z). 
In the body of the loop we store in nextk the value Icp [k] which is Rank(z-|- 1); 
then we compute £i (the LCP between t[z, rz] and the suffix preceding it) and we 
store it in Icp [k] , which is the right place since k = Rank(z). Finally, we update 
k (Line 14) and we start the next iteration. Note that the actual computation 
of £i is done as in the Lcpl3 algorithm; indeed, lines 5-13 are identical in both 
algorithms. The only difference between our algorithm and the one of Kasai et 
al. is the computation of the rank information using the RankNext map rather 
than the rank array. 

We conclude observing that the correctness of the procedure Sa2RankNext 
follows from the correctness of Bwt2 RankNext in Fig. 2 and by the relationship 
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Procedure Lcp9 



Procedure Sa2RankNext (rankmext) 



1. k = Sa2RankNext (Icp) ; 


1. 


j = ++count [t [n] ] ; 


2. h=0; 


2. 


rank_next [ j ] =0 ; 


3. f or (i=l ; i<=n; i++) •[ 


3. 


for (i=l ; i<=n; i++) { 


4 . nextk = Icp [k] ; 


4. 


II 

II 

1 — 1 
•H 

W 

•H 


5. if(k==l) lcp[k]=-l; 


5. 


eos_pos = i; 


6. else f 


6. 


else { 


7. j = sa[k-l] ; 


7. 


c = t [sa[i] -1] ; 


8. while (i+h<=n && j+h<=n 


8. 


j = ++count [c] ; 


9. && t [i+h] ==t [j+h] ) 


9. 


rEuik next [j] =i ; 


10. h++; 


10. 


} 


11. lcp[k] = h; 


11. 


> 


12. y 


12. 


return eos_pos; 



13. if(h>0) h— ; 

14. k=nextk; 

15. > 



Fig. 4. Algorithm Lcp9 for linear time computation of the LCP array using 9n bytes of 
storage. The algorithm takes as input the text t and the suffix array sa and stores in 
Icp the LCP array. The procedure Sa2RankNext computes the RankNext map given 
t and sa. After the procedure call at Line 1 of Lcp9 the RankNext map is stored in 
the array Icp and the value Rank(1) is stored in the variable k. 



between the suffix array and the Burrows- Wheeler Transform (see the procedure 
Sa2Bwt in Fig. 2). 



4 LCP Computation in (6 + S)n Bytes of Storage 

In this section we describe the algorithm Lcp6 which computes the LCP array 
“overwriting” the suffix array in the sense that the LCP array is stored in the 
same array which initially contains the suffix array entries. 

Recall that the correctness of the algorithm of Kasai et al. follows from the 
observation that whenever £i and are both defined we have £i > £i-\ — 1 (see 
Section 2.2). The following Lemma (see [12] for the proof) shows that using the 
Burrows- Wheeler Transform of t we can say something more on the relationship 
between £i and £i-\. 

Lemma 1. Let bwt denote the Burrows- Wheeler Transform of t, and let k = 
Rank(i). If k > 1 and bwt[fc] = bwt[fc — 1] then ii = £i-\ — 1. □ 

Assume now that the array bwt is available, and consider the main loop of 
Lcp9 (Lines 3-15 in Fig. 4). At the beginning of the z-th iteration the variable 
k contains the value k = Rank(z). By Lemma 1, if bwt[fc] = bwt[fc — 1] we 
know that £i = £i-i — 1. Since £i-i is stored in the variable h, we conclude 
that, if bwt[fc] = bwt[fc — 1], we can skip Lines 8-10 and proceed with the next 
iteration. This means that for computing the LCP array we only need the values 
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SA[fc — l]’s for all k’s such that bwt[A:] ^ bwt[/c — 1]. This observation is the 
starting point of our algorithm. 

Let z' denote the number of consecutive equal characters in bwt and let 
z = n — z' . In the algorithm Lcp6 (see Figure 5) we evaluate z with a scan 
of bwt and we allocate an array sa_aux of size z for storing those suffix array 
entries that are needed for computing the LCP array (Lines 2-4). Although 
we already know which suffix array entries must be stored in sa_aux, to retrieve 
these entries efficiently we must store them in the proper order. Let ki,k 2 , ■ ■ ■ ,kz, 
with ki < k 2 < ■ ■ ■ < kz denote the indexes such that bwt[fci] yf bwt[fci — 1]. By 
the above discussion we know that we must store in sa_aux the values SA[fci — 
1], SA[fc 2 — 1], • ■ • , SA[kz — 1]. Note, however, that the value SA[fci — 1] is needed 
when we process the suffix t[SA[fci], n]. Since the main loop of the LCP algorithm 
considers the suffixes in the order t[l,n],t[2,n], . . . ,t[n,n] in Lcp6 we store in 
sa_aux [i] the value SA[fc,r(i) ~ 1] where tt is a permutation of 1, . . . , z such that 

SA[/c,r(l)] < SA[/c,r(2)] < ‘ ‘ ‘ < SA[fc,r(z)]- (3) 

In other words, we store in sa_aux the suffix array entries in the order in which 
they will be used by the LCP algorithm. This will make the retrieval a very 
simple task. 

To obtain such a convenient arrangement of the suffix array entries within 
sa_aux, the algorithm Lcp6 uses the following two-step procedure. In the first 
step (Lines 6-10) the algorithm computes the RankNext map storing it in the 
array sa. Then, it generates the sequence Rank(1), Rank( 2), . . . , RANK(n) thus 
traversing the suffix array entries in the order in which they will be considered by 
the LCP computation. When Lcp6 finds an index k such that bwt[/c— 1] yf bwt[/c] 
it stores A: — 1 in the next empty position of sa_aux (Line 8). Hence, at the end of 
this first step, for t = 1, . . . , z, the entry sa_aux[i] contains the value ~ 
where tt is the permutation defined by (3). In the second step (Lines 12-14) 
the algorithm recomputes the suffix array and, with a simple scan over sa_aux, 
stores in sa_aux[i] the value SA[fc,r(i) ~ !]• Note that we use this elaborate two 
step procedure simply because we do not want to store at the same time both 
the suffix array and the RankNext map. 

Once the array sa_aux is properly initialized, the computation of the LCP 
array proceeds as in algorithm Lcp9. First, we store the RankNext map in the 
array sa (Line 16). Then, at each iteration of the main loop (Lines 18-30) a 
RankNext value in sa is replaced by a LCP value so that at the end of the 
loop sa contains the LCP array. The computation of the value £i makes use of 
Lemma 1. At the beginning of the i-th iteration the variable k contains RANK(i); 
if bwt [k-1] ==bwt [k] then Aj is equal to — 1 (which is readily available since it 

is stored in the variable h); otherwise we retrieve from sa_aux the value Sa[/c— 1] 
(Line 23) and we compute £i with the while loop of Lines 24-25. 

In our “real world” model the total space occupancy of the above algorithm 
is 6n -I- 4z bytes: we use 2n bytes for the arrays t and bwt, 4n bytes for the 
array sa (which is used for storing the suffix array, the RankNext map, and 
the LCP array), and 4z bytes for sa_aux. This latter amount depends on the 
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structure of the input. More precisely, in [10] it is proven that for any A: > 1 we 
have z < |i7|* + 2nHk, where |i7| is the alphabet size and Hk is the k-th order 
entropy of the input. In practice, for linguistic texts and other “structured” 
texts the Burrows- Wheeler Transform usually contains many repetitions and 
consequently 2 is relatively small. If z « n/2 (which is not an unusual value) the 
total space occupancy of Lcp6 is « 8n bytes. In the worst case we have z = n 
and our algorithm uses lOn bytes. However, if at Line 4 we find that z > 3n/4 — 
which would yield a space occupancy larger than 9n bytes — we can quit Lcp6 
and use Lcp9 instead. 



Algorithm Lcp6 

1. // count how many suffix array entries we need 

2. f or (z=0, i=2 ; i<=n; i++) 

3. if (bwt [i-1] ! =bwt [i] ) z++; 

4. sa_aux=malloc(z*sizeof (int) ) ; // allocate sa_aux[0,z-l] 

5. // determine order in which suffix array entries are needed 

6. k = Bwt2RankNext (sa) ; // store RankNext in sa[] 

7. f or (v=0, i=2 ; i<=n; i++) { 

8. if (bwt [k-1] ! =bwt [k] ) sa_aux [v++] =k-l ; 

9 . k=lcp [k] ; 

10 . > 

11. // store needed suffix array entries in sa_aux 

12. RankNext2Suff ixArray (sa) ; // store Suffix Array in sa[] 

13. for(v=0;v<z;v++) 

14. sa_aux[v] = sa[sa_aux [v] ] ; 

15. // compute the Icp array as usual 

16. k = Bwt2RankNext (sa) ; // store RankNext in sa[] 

17. v=h=0; 

18. f or (i=l ; i<=n; i++) -[ 

19. nextk = sa[k]; 

20. if(k==l) sa[k]=-l; 

21. else if (bwt [k-1] ==bwt [k] ) sa[k]=h; 

22. else ■[ 

23. j = sa_aux[v++]; // retrieve sa[k-l] 

24. while(i+h<=n && j+h<=n && t [i+h] ==t [j+h] ) 

25. h++; 

26. sa[k] = h; 

27. } 

28. if(h>0) h— ; 

29. k=nextk; 

30. } 



Fig. 5. Algorithm Lcp6 for linear time computation of the LCP array using (6 -I- <5)n 
bytes of storage. The algorithm takes as input the text t, the Burrows- Wheeler Trans- 
form bwt, and the suffix array sa and stores the LCP values in sa (thus overwriting the 
suffix array entries). The algorithm uses an auxiliary array sa_aux whose size depends 
on the structure of the input text. After the procedure calls at Lines 6 and 16 the 
RankNext map is stored in sa and the value Rank(1) is stored in k. The procedure 
call at Line 12 stores the suffix array in the array sa overwriting the RankNext map 
(see comment at the end of Sect. 2.1). 
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5 Experimental Results 

We have tested the algorithms Lcpl3, Lcp9, and Lcp6 on a collection of files with 
different lengths and structures (a more detailed experimental analysis can be 
found in [12]). For each file we built the suffix array using the ds algorithm [13,14] 
which is currently one of the fastest suffix array construction algorithm. Then, 
the text and the suffix array were given as input to the algorithms Lcpl3, Lcp9, 
and Lcp6 and their running times were measured considering (user+system) time 
averaged over five runs. For all tests we used a 1700 MHz Pentium 4 running 
GNU/Linux with 1.25GB main memory and 256Kb L2 cache. 

In Table 1 we report, for each file and for each algorithm, the running time 
over file length. For Lcp6 we also report the space occupancy expressed as total 
space occupancy over file length. The files in Table 1 are ordered by increasing 
average LGP: a large average LGP indicates that the input file contains many 
long repeated substrings. Note that the file etextQQ.gz has a very small average 
LGP since it is a compressed file and essentially consists of a “random” sequence 
over the alphabet {0,1,..., 255}. The file chr22 is a DNA sequence and consists 
of an apparently random sequence over the alphabet {a,c,g,t}: its relatively 
high average LGP is due to the small cardinality of the underlying alphabet. 

Our first observation is that Lcp9 is roughly 10% slower than Lcpl3. We 
also notice that for most files both LGP algorithms are faster than the suffix 
array construction algorithm. Thus, if we consider the combined time required 
to compute suffix array and LGP array, the overhead for using Lcp9 is usually 
less than 5% of the total running time. For the algorithm Lcp6 we observe that 
it is roughly two times slower that Lcpl3. However, we also notice that for most 



Table 1. Experimental results for LCP construction algorithms. The second and third 
column show the size and average LCP of the input file. The fourth column reports the 
time (microseconds per input byte) for the construction of the suffix array. The next 
three columns report the time (microseconds per input byte) for the computation of the 
LCP array using the algorithms Lcpl3, Lcp9, and Lcp6 respectively. The last column 
shows the space used by Lcp6 expressed as total space occupancy over file length. 



File 


Size (Kb) 


Ave. LCP 


SA time 


Lcpl3 time 


Lcp9 time 


Lcp6 time 


Lcp6 space 


etext99.gz 


38,747 


2.65 


0.97 


1.07 


1.18 


2.08 


9.97 


sprot 


107,048 


89.08 


1.49 


1.00 


1.03 


1.90 


7.01 


rfc 


113,693 


93.02 


1.18 


0.89 


0.92 


1.66 


6.86 


howto 


38,498 


267.56 


0.99 


0.77 


0.84 


1.48 


7.29 


renters 


112,022 


282.07 


2.65 


0.91 


0.96 


1.77 


6.58 


linux 


113,530 


479.00 


1.04 


0.76 


0.76 


1.35 


6.88 


jdklS 


68,094 


678.94 


2.67 


0.69 


0.75 


1.33 


6.26 


etext99 


102,809 


1,108.63 


1.55 


1.07 


1.10 


2.02 


7.57 


chr22 


33,743 


1,979.25 


0.96 


0.92 


1.01 


1.76 


8.34 


gee 


84,600 


8,603.21 


1.87 


0.69 


0.73 


1.30 


6.75 


w3e 


101,759 


42,299.75 


2.11 


0.72 


0.79 


1.40 


6.31 
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files Lcp6 uses less than 8n bytes. The exceptions are, as expected, calgary.zip 
and chr22. We conclude that, although Lcp6 is slower than Lcpl3 and Lcp9, for 
most files it yields a significant saving in the peak space occupancy. For very 
large files the combination of Lcp6 with a “lightweight” suffix sorter [1,14] can 
be the only way to avoid the (deleterious) use of secondary memory. 
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Abstract. We present a solution to the fully-dynamic all pairs shortest 
path problem for a directed graph with arbitrary weights allowing nega- 
tive cycles. We support each vertex update in 0(n^(logn -|- log^(rh/n))) 
amortized time. Here, n is the number vertices, m the number of edges 
and ffi = n + m. A vertex update inserts or deletes a vertex with all 
incident edges, and we update a complete distance matrix accordingly. 
The algorithm runs on a comparison-addition based pointer-machine. 



1 Introduction 

Recently Demetrescu and Italiano [1] presented an exciting new approach to 
the fully-dynamic all-pairs shortest path (APSP) problem with positive weights. 
Their algorithm supports each vertex update in O(n^log^n) amortized time^. 
Here n is the number of vertices. A vertex update inserts or deletes a vertex with 
all its incident edges. Between updates, a complete distance matrix is maintained. 
The algorithm also maintains the next hop on a shortest path from any vertex 
towards any destination. The algorithm runs on a comparison-addition based 
pointer-machine . 

We refer the reader to [1] for the rich history of dynamic shortest path prob- 
lems which has publications dating back to 1967. Here we just note that before 
[1], the best amortized update time for APSP in general graphs was 
with unit weights [6]. The new O(n^log^n) amortized update time from [1] is a 
substantial improvement and allows arbitrary non-negative weights. 

1.1 An Even Faster Fully-Dynamic APSP Algorithm 

In this paper, we present a different version of the algorithm from [1], maintain- 
ing the same type of information, but being easier to analyze and tune, getting 
tighter bounds, and thus providing a better understanding of the general new 
approach. Our amortized update time is 0(n^(log n -I- log^(?7i/n))). We also re- 
duce the space from 0{nmlogn) to 0(mn). Here m is the number of edges and 
fh = m + n. 

^ In the final remarks of [1], Demetrescu and Italiano state that their bounds can be 
improved to 0(n^ log^ n) using a Fibonacci heap, but that claim is withdrawn in [2]. 
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While the improvement is only by one or two log-factors, depending on the 
sparsity of the graph, it should be compared with a lower-bound of f2(n^) needed 
just to update the distance matrix. Thus, we cannot improve by more than log- 
factors. Also, our algorithm picks distances out of a priority queue, and since we 
may have 0{n^) distance changes per vertex update, any such algorithm needs 
f?(n^logn) comparisons. It is also interesting to compare our algorithm with 
the standard static competitor, running Dijkstra’s [3] single source algorithm 
from each source with a Fibonacci heap [4] in 0(n^ log n -I- nm) total time. Our 
0{n^{logn + log^{m/n))) bound is never worse, and it is an improvement when- 
ever m = w(nlogn). We note that there is a faster comparison-addition based 
APSP algorithm [8] running in 0(n^ log log n -I- nm) time, but that algorithm is 
not based on priority queues. 



1.2 Allowing Negative Weights and Cycles 

Our new version can be extended to deal with negative weights, allowing negative 
cycles. For these arbitrary weights, we get the same amortized update time of 
0(n^(logn -I- log^ (m/n))). To the best of our knowledge, for arbitrary weights, 
this is the first fully-dynamic APSP algorithm with better amortized updates 
than a static recomputation from scratch. The extension is non-trivial, but, 
unfortunately, there is not room for it in this extended abstract. 



1.3 Notation 

The vertex set of a graph G is denoted V (G) and the edge set is denoted E{G). 
If U C V{G), then G\U denotes the subgraph of G where we have removed 
the vertices from U with their incident edges. As a slight abuse of notation, if v 
is a single vertex, we define G \ v = G \ {u}. If v precedes ru in a path P, then 
P[v,w] denotes the subpath from u to ic of P. Also, first{P) and last{P) denote 
the first and the last vertex in P. An s-t path is a path P with s = first{P) and 
t = last{P). If P and Q are paths with last{P) = first{Q), then PQ denotes the 
concatenation of P and Q. 

2 The Approach of Demetrescu and Italiano 

In this section, we present the new approach of Demetrescu and Italiano to the 
dynamic APSP problem [2]. The presentation is, however, directed towards our 
own developments to be presented in the subsequent sections. 

We say that two paths are alternatives if they start and finish in the same 
vertices. A path is a shortest path if there is no shorter alternative. We assume 
that shortest paths are unique, that is, for a shortest path all alternatives are 
longer. In [2] is presented an elegant way of achieving this uniqueness even in 
the deterministic case and without loss of efficiency. 
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2.1 Selecting Shortest Paths Via Generated Paths 

The dynamic APSP algorithm operates on a set of selected paths. Generally, 
selected paths are paths that at some stage have been identified as shortest. 
After each vertex update, the algorithm will make sure to select all current 
shortest paths. It may also de-select some selected paths so as to make sure that 
not too many paths are selected. 

Demetrescu and Italiano [2] presented a very interesting approach for select- 
ing shortest paths. A path P is generated if we get a selected path no matter 
which end-point from P we remove. We say that a path P is improving if it is 
strictly shorter than any selected alternative.^ 

Trivial paths consisting of a single vertex form a special case, for if we remove 
that vertex, we have nothing left. We define all trivial paths to be generated. 

Lemma 1. (a) Let Q he a shortest path which is not selected. Then Q has an 
improving generated subpath R. In particular, if there are no improving gen- 
erated paths, then all shortest paths are selected. 

(b) Let P have minimum length amongst all generated improving paths. Then P 
is a shortest path which is not yet selected. 

Proof. To prove (a), let i? be a minimal subpath of Q that is not selected. Then 
R is generated and shortest but not selected, so R is improving. 

To prove (b), suppose for a contradiction that P is not shortest and consider a 
shortest alternative Q of P. Since P is improving, we know that Q is not selected. 
Hence by (a), we have an improving generated subpath R of Q. Now length{R) < 
length{Q) < length{P) contradicting the choice of P. Thus we conclude that P 
is a shortest path. □ 

Lemma 1 provides us a process to select all shortest paths. As long as there 
are improving generated paths, we select such a path of minimal length. By 
Lemma 1, this process selects exactly the shortest paths which were not shortest 
when we started. 



2.2 A Path System with Priority Queues 

In order to implement the selection of all shortest paths, [2] presents a path 
system to maintain selected and generated paths. We refer to all paths in the 
system as system paths, noting that the same path may be both selected and 
generated. 

The path system knows the graph, so when a vertex v is inserted, the trivial 
path (u) is generated immediately. Whenever a path is selected, the system com- 
bines it with previously selected paths in new generated paths. We can only select 

^ Our “selected paths” are the “zombies” in [1] and the “historically shortest paths” in 
[2] . Our “generated paths” are the “potentially uniform paths” in [1] and the “locally 
historical paths” in [2]. The concept of “improving generated paths” is important in 
[1,2] but was not named. Our terminology is shorter, and more convenient when we 
later want to talk about other types of generated paths. 
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a path if it is a generated path generated by the system. We can ask the system 
to destroy all system paths containing a given vertex. This happens automati- 
cally when a vertex is deleted from the graph. The path system is implemented 
below in §2.3 in constant time per system path change. The above operations 
maintain that all selected paths are generated. Recursively this implies that any 
subpath of a selected (generated) path is selected (generated). 

For each start-finish vertex pair (s,t), the we have a start-finish priority 
queue Q(s,t) with all system paths from s to t, the shorter with higher priority. 
If the shortest system path in Q(s,t) is not selected, then P is an improving 
generated path which participates in a global priority queue Qg- Using classic 
comparison-based priority queues [11], each operation on a queue is supported 
n O(logn) time. 

The selection of all shortest paths is now implemented as follows. As long as 
the global priority queue Qg is non-empty, we select the shortest path from Qg ■ 
When Qg is empty, for each s and t, the shortest s-t path is found in Q(s,t)- 



2.3 Implementing the Path System 

We now show how to implement the path system itself. Currently, all system 
paths are generated paths, and all subpaths of generated paths are generated 
paths. Every system path is given a unique identifier, from which we can derive 
information such as end-points, length, and first edge. 

If P is non-trivial, we say that P is a pre-extension of P\first{P) and post- 
extension of P\last{P). If Q is generated, we store with Q the set of its generated 
pre-extension and the set of its generated post-extensions. Using these sets, we 
can identify and destroy all generated paths containing a given vertex in constant 
time per path. 

Together with Q we also store the sets of selected pre- and post-extensions. 
When a new pre-extension Pi of Q is selected, we take each currently selected 
post-extension P 2 of Q, and generate the new path Pi U P 2 - The case when a 
new post-extension is selected is symmetric. Thus each new path is generated in 
constant time. 

In the above path system, we pay constant time per path change. This is 
dominated by the 0(log n) time it takes to modify the start-finish priority queues 
and the global priority queue in §2.2. 



2.4 A Basic Dynamic APSP Algorithm 

Using the above path system, we have a basic algorithm for the dynamic APSP 
problem. If a vertex is inserted, we select all shortest paths. If a vertex is deleted, 
we first destroy all system paths containing it, and then select all shortest paths. 



2.5 The Key to Efficiency 

The following lemma is crucial to efficiency: 
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Lemma 2. Suppose all selected paths containing v are shortest. If s and t are 
vertices different from v, there is at most one generated s-t path which contains 
V. Moreover, there are at most 0{n^) generated paths containing v. 

Proof. Consider a generated .s-t path P which contains v. Then P[s,w] is con- 
tained in the selected path P\t which contains v. Hence P[s,v] is the shortest 
s-v path. Symmetrically, P[t,v] is the shortest v-t path, so P is unique. It im- 
mediately follows that v is internal to at most generated paths. 

Now, consider a generated path P from v to some vertex t, and let t' be 
the predecessor of t in P. Then P[s,t'] is a selected path, so P is uniquely 
determined by t' and t. Symmetrically, there are at most choices of generated 
paths finishing in u. □ 

2.6 Efficiency of the Static Case 

In the static case, in the process selecting all shortest paths, the path system 
will first generate all trivial paths. We will then continue to select and generate 
paths until we have selected the set of all shortest paths. 

Lemma 3. When all selected paths are shortest, the total number of generated 
paths is 0{n^). In particular, it takes O(n^logn) time to select all shortest paths 
in the static case. 

Proof. Since we have n vertices, the first part is a direct corollary of Lemma 2. 
Also, there are at most shortest paths to select. We spend O(logn) time per 
system path, so the total running time is O(n^logn). □ 

2.7 Efficiency of the Incremental Case 

Consider the incremental case of the simple algorithm, that is, no deletes are 
allowed. When a vertex is inserted, all new selected paths are shortest paths 
containing v. Then Lemma 2 implies that we create at most 0{n^) system paths, 
and that takes O(n^logn) time. That is. 

Lemma 4. The basic algorithm supports an insert in 0{n^ log n) time. □ 

2.8 Efficiency of the Decremental Case 

Now consider the decremental case of the basic algorithm, that is, no inserts are 
allowed. Here, we first run the static algorithm as in §2.6. Each time a vertex 
is deleted, all system paths containing it are destroyed. Afterwards the basic 
algorithm selects all new shortest paths. The crucial observation is that selected 
paths remain shortest until destroyed. 

Lemma 5. Starting with a graph with n vertices, the basic algorithm can support 
up to n deletions in 0(n^ log n) total time. 

Proof. By Lemma 3, there can be at most 0{n^) generated paths when the 
deletions are completed. All other generated paths are destroyed by deletions. 
By Lemma 2, each deletion destroys O(n^) generated paths. Consequently, the 
total number of path changes is O(n^), so the total running time is O(n^logn). 

□ 
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2.9 The Fully-Dynamic Case 

Unfortunately, the basic algorithm is not efficient in the fully-dynamic case. One 
can construct a sequence of n vertex inserts followed by n vertex deletes so that 
each delete makes 0{n^) path changes. The problem is that a shortest path 
selected after one insert may not be shortest after some future insert. With such 
non-shortest selected paths, the efficiency of deletions breaks down. 

The approach of Demetrescu and Italiano [2] to the fully-dynamic APSP 
problem is that when a vertex v is inserted, for / = 0, 1, ..., we wait for 2^ 
updates, and then we make an extra dummy update on v. The dummy update 
deletes and inserts v with the same edges. The effect is to de-select paths through 
V that are no longer shortest. With this grooming, they generalize Lemma 2 to 
show that there can be at most 0(n^ logn) generated paths through any vertex. 
Consequently, no update or dummy update can destroy more than O(n^logn) 
paths. Since each real update gives rise to O(logn) dummy updates, they get 
0{in? log^ n) path updates per real vertex update, hence an amortized update 
time of log^ n). 

3 Our Basic Algorithm 

Our algorithm for the fully-dynamic APSP problem follows a general idea of 
Henzinger and King [5] reducing a fully-dynamic problem to a logarithmic num- 
ber of decremental problems. Henzinger and King’s idea was originally developed 
for the fully-dynamic minimum spanning tree problem. Here we use the idea in 
the context of the fully-dynamic APSP problem, essentially exploiting the effi- 
ciency of the simple decremental algorithm in §2.8. In our first implementation, 
we improve the APSP amortized update time to 0(n^ log^ n). In the next sec- 
tion, we will tune our implementation for sparse graphs, getting the claimed 
amortized update time of 0(n^(log n -I- log^(m/n)). Finally, we will sketch how 
to deal with negative edge weights. 



3.1 Dividing into Levels 

Updates are numbered t = 1,2, 3... and the birth date of a vertex is the number 
of the update inserting it. The graph is rebuild whenever t > 2n with n the 
current number of vertices. We then set t = 1 and rebuild the graph with n 
reinserts. Asymptotically, this does not affect our amortized time bounds. 

We impose a standard type binary hierarchy over the update sequence. We 
say that level I is active after update t if bit i is set in t. Here bit 0 is the least 
significant bit. Also, t activates the level of its least significant set bit, that is, if 
L is the least significant set bit of t, then level L is inactive before t and active 
after t. Also, t deactivates the active levels lower than L. 

If level I is active, we let tj denote the update that activated level I . Note 
that if level J > I is also active, then tj < tj. Hence, among active levels, we 
sometimes refer to active higher levels as older levels. 
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When we activate a level /, we construct a level I graph Gj as a copy of the 
current graph G. The vertices from G/ are called level I vertices. While level 
/ is active, we do not add any vertices to G/ but if a level I vertex is deleted 
from G, it is also deleted from G/. Thus G/ is a decremental dynamic graph. 
We destroy G/ when level I is deactivated. We will often identify an active level 
I with its decremental level graph G/. 

The vertices in G/ that were not in the previous level graph Gj are called 
level I centers. More formally, we have active levels / and J with I < J and no 
active levels between I and J. Then the level I centers are the vertices inserted 
during updates t G {tj,ti]. We let Cj denote the set of level I centers. Then 
Gj = Gj\Ci. Clearly each vertex v is center in exactly one level I, and we say 
that v is centered in level I. Consider a path P. Let v be its youngest vertex, 
centered in some level I . Then G/ is the oldest level graph containing P. We say 
that P is centered in v, in level I, and in G/. The basic goal of an active level / 
is to identify the shortest paths in G that are centered in level /. 

3.2 The Level Path System 

We will have a specialized level system of selected paths. Each selected path P 
may be selected for any level I with P Q Gj. Also P may be selected for the 
current graph G. However, we have the requirement that if P is selected for G, 
then P has to be selected for all levels I with P C Gj. Although this is not part 
of the definition, in our algorithms, a path selected for a level I will always be 
shortest in G/. 

A path P is generated by level I if it satisfies the following two conditions: 

— P is centered in level I. 

— if P is not a trivial path, then, no matter which end-point we remove from 
P, we get a path selected for level I. 

The first conditions ensures that P can only be generated by a single level. If we 
do not want to specify this level, we say that P is level generated. 

The second condition implies that if P is level generated, it is also generated 
with the original definition from §2. However, the converse is not true, for an 
originally generated paths may not be generated by any level. Our restriction to 
levels is our key to efficiency, but before turning to efficiency, we prove that our 
level path system can be used to generate shortest paths. 

We say a path P in the current graph G is improving if it is shorter than 
any alternative selected for G. This redefinition of improving does not take into 
account paths that are only selected for levels. Analog to Lemmal, we get 

Lemma 6. (a) Let Q be a shortest path in the current graph G which is not 
selected for G. Then Q has an improving level generated subpath R. In par- 
ticular, if there are no improving level generated paths, then all shortest paths 
are selected for G. 

(b) Let P have minimum length amongst all level generated improving paths. 
Then P is a shortest path in G which is not yet selected for G. 
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Proof. To prove (a), let i? be a minimal subpath of Q that is not selected for G. 
Then R is improving. Let I be the level that R is centered in. If R is trivial, it is 
generated by level I. Otherwise, removing either end-point from R, we get a path 
S which is selected for G, but then S is also selected for level I. Consequently, 
R is generated by level I. With (a) settled, the proof of (b) is identical to that 
of Lemma 1(b). □ 



3.2.1 Implementation. It is straightforward to modify the path system from 
§2. 2-2. 3 to deal with levels as described above. More precisely, we make an 
independent path system for each level based on paths selected for that level. 
However, in the level I path system, we have the restriction that two level / 
selected paths may only be combined in a generated path if at least one of them 
is centered in level I . An efficient implementation requires that level I selected 
post-extensions P 2 of a path Q are divided depending on whether P 2 is centered 
in level I . Consider a new level / selected pre-extension Pi of Q. If Pi is centered 
in level I, we generate Pi U P 2 for all level I selected post-extensions P 2 of Q; 
otherwise, we only use those P 2 that are centered in level I. 

The start-finish priority queue Q{s,t) now has all s-t paths that are either 
level generated or selected for the current graph. If the shortest system path in 
is not selected for the current graph, then P participates in the global 
priority queue Qc- 



3.3 Fully-Dynamic APSP with the Level Path System 

Given the above level path system, we have a simple fully-dynamic APSP algo- 
rithm. The system starts with an empty graph, no level graphs, and an empty 
level path system. 

To process an update t, our first action is to de-select all paths from the 
current graph. The update t activates some level L and deactivates all levels 
K < L. To execute the deactivation, we destroy all system paths through the 
level K centers, and de-select all other paths from level K. If the update t deletes 
a vertex v, we also destroy all system paths containing v. 

Next we activate level L. If the update t inserts a vertex, it becomes a level 
L center along with the centers from the deactivated levels. Each trivial path 
consisting of a level L center is generated immediately by level L. Now, as long as 
the global priority queue Qc is non-empty, we pick its shortest path P. Then we 
select P for the current graph G and for all levels I with P G Gj. By Lemma 6, 
the above process generates exactly the shortest paths in the current graph G. 

3.4 Analysis 

We note that we only select a path for an active level I when P is shortest in the 
current graph G. Trivially, this implies that P is shortest in the subgraph Gj, 
and since Gj is decremental, P remains shortest till Gj is deactivated. Thus, we 
have 
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Invariant 1 All paths selected for level I are shortest paths in Gj. □ 

With this invariant, Lemma 2 applies to all paths generated by level I. Conse- 
quently, 

Lemma 7. On a given level, we have only 0{n^) system paths through any 
vertex. □ 

Lemma 8. The number of paths generated by level I is always 0{2^n^). More- 
over, 0{2^n^) bounds the number of changes to paths generated by level I while 
I is in an active period, including the deactivation at the end. 

Proof. A path is only generated by level I if it goes through one of the at most 
2^ centers on level / center. By Lemma 7, there can never be more than 0{n^) 
generated paths through any such center, so we can have at most 0{2^n^) level I 
generated paths. By Lemma 7, we also get that each of the at most 2^ deletions 
can destroy at most O(n^) level I generated paths. As in the proof of Lemma 5, 
we conclude that 0{2^n^) bounds the total number of changes to paths generated 
by level I. □ 

We are now ready to analyze our total cost. 

Lemma 9. The above APSP algorithm supports each vertex update in 
0{n^ log^ n) amortized time. 

Proof. We analyze the cost as follows. 

— At each update, we identify O(n^) shortest paths P. Each P is selected for 
the current graph with a priority queue cost of O(logn). Also, P is selected 
at constant cost for O(logn) levels. Thus, the total cost of selecting shortest 
paths is O(n^logn). 

— A level I is active for 2^ vertex updates, and in this period, by Lemma 8, we 
have 0{2^n^) changes to paths generated by level I, that is 0{n^) changes 
per update. Each such change costs O(logn) in the priority queues, so over 
the O(logn) levels I, we have a cost of 0{n^ logn) per vertex update. 

Adding up the above items, we conclude that we spend O(n^log^n) amortized 
time on each vertex update. We note that a level / alternates between being 
active and inactive periods of 2^ updates, starting with an inactive period. Hence, 
in case we stop in the beginning of an active period, we can amortize the active 
work over the preceding 2^ inactive updates. □ 

3.5 Faster or Better Analysis? 

We have now obtained an amortized bound that is a factor log n better than the 
one originally provided in [1] (c.f. §2.9). One may ask if this is just a better anal- 
ysis or if the algorithm really has a better worst-case performance. We believe 
the latter for the following reason: Our division into levels can be viewed as very 
similar to the exponentially spaced dummy updates from [1]. When we activate 
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a level L, we destroy all selected paths through the centers on deactivated levels 
K < L, and this is an analog to dummy update. Thus, we end up with a similar 
set of selected paths. However, in [1], we can combine arbitrary selected paths in 
generated paths. Here we only combine paths selected for the same level. Thus it 
seems plausible that we save a factor of logn in the number of generated paths. 

4 Tuning for Sparse Graphs 

We are now going tune the above algorithm to be more efficient for sparse graphs, 
reducing the running time to 0{n^(logn + log^ (m/n))). From the preceding 
analysis of Lemma 9, we know that the only cost that exceeds O(n^logn) is 
that of the level generated paths. Below, we will first reduce the number of 
these paths by reducing the number of levels to 0(log(m/n)). Second, we will 
reduce the size of most priority queues to 0(m/n), thereby reducing their cost 
to O (log (m/n)). These two improvements will reduce the overall update cost to 
0(n^(logn -|- log^(m/n))). We note that this only improves over our previous 
O(n^log^n) bound if m/n = Hence we can assume that we are dealing 

with a sparse graph with rhln = n°^^\ 

4.1 Fewer Levels 

In order to benefit from sparseness of a graph, we are going to divide our updates 
into epochs of length 0{rh/n). More precisely, when the epoch start, we set it 
to run for q = |"m/(2n)] vertex updates. During this period, fh = m + n and 
n cannot change by more than a factor 2, so we preserve q = 0{m/n). We also 
note that fh > n, so q is at least 1. 

Before the first update ts of an epoch, we copy the current graph G into 
a decremental base graph Gb- During the epoch, the base graph is treated like 
an oldest active level graph. However, all vertices in Gb are viewed as centers, 
so any path in Gb is viewed as centered in G^. Since an epoch has only m/n 
updates, it will never activate more than log 2 (m/n) regular levels. 

All our preceding analysis of efficiency relied on each level having no more 
centers than the number of updates while active. However, Gb may have fl(n) 
centers, and an epoch only lasts for m/n updates. For our sparse graphs, m/n = 
so we need a different analysis for Gb- 

Lemma 10. The total number of paths generated by the base graph is at most 
n(m + 1) < nm. 

Proof. As in Invariant 1, all paths selected for the base graph are shortest. There 
are n trivial paths. Consider any non-trivial path P generated by the base. Let 
(u, v) be the first edge of P and w the last vertex. The segment P[v, w] is selected 
for the base, hence the unique shortest path from v to w. Consequently there 
are at most nm non-trivial base generated paths. □ 

Lemma 11. During an epoch, we have 0{fhn) paths generated by the base, and 
they cost O(mnlogn) time in the priority queues. 
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Proof. By Lemma 10, when an epoch ends, we have at most nrh remaining base 
generated paths to be destroyed. Also, applying Lemma 7 to the base, we get 
that each of the 0{fh/n) vertex deletions destroys 0{n'^) base generated paths. 

□ 



Lemma 12. With the base graph, our fully- dynamic APSP algorithm supports 
each vertex update in 0(n^(logn)(log(m/n))) amortized time and in 0{fhn) 
space. 

Proof. In the analysis for Lemma 9, we showed that each update takes 
O(n^logn) amortized time on each level. Now that we have only 0(log(m/n)) 
levels, this amortized update time is hence reduced to 0{n^ {log n)(log{fh/n)). 
By Lemma 11, the work in the base during an epoch takes 0(mn log n) time. In 
case, we stop in the middle of an epoch, we amortize this work over the previous 
epoch which was completed with 0{fhjn) updates. The first epoch has an empty 
base, hence no base work to amortize. Thus, in the base, the amortized work per 
update is O(n^logn). Here the first epoch is a single insert in an empty graph, 
and it pays for itself in constant time. Adding up, we support each update in 
0(n^(logn)(log(m/n)) time. 

Apart from the level generated paths, the space used for each level or base 
graph is 0{n^), adding up to 0(n^ log(m/n)) = 0{mn). By Lemma 10, there 
are 0{mn) paths generated by the base graph, and by Lemma 8, there are 
0{2^n^) = 0{fhn) paths generated by the non-base levels. Thus we 
conclude that the total space is 0{fhn). □ 

4.2 Reducing the Priority Queue Cost 

As suggested in [1], it is straightforward to reduce the total cost of using the 
global priority queue Qg to 0(n^ log n) per vertex update if we use a Fibonacci 
heap [4] supporting inserts and decreases in constant time. A minor change to 
the processing of an update is that we should wait entering paths in the global 
priority queue till after we have destroyed all the paths through deactivated 
centers and a deleted vertex. 

Our challenge is to reduce the cost from the start-finish priority queues. 
The trick is to split each Q(s,t) into a small priority queue with many 

changes, and a large priority queue with few changes. Then min Q(s,t) = 

minjmin niin The small priority queue contains any s-t path 

generated by a level / where neither s nor t are centers. It may also contain the 
unique shortest s-t path if it is selected for the current graph. The large priority 
queue contains all remaining level generated s-t paths. These are either from the 
base graph, or they are from a level graph G/ where s or t are centers. 

Lemma 13. A small priority queue has at most fh/n paths. Hence small 

priority queue changes take 0{\og{fh/n)) time. 
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Proof. Each path P in is generated by a non-base level / where neither 

s nor t are centers. Hence P has a level I center v distinct from s and t. This 
means that P[s,f] and P[v,t] are selected for I. Hence, by Invariant 1, these 
segments are shortest paths in G/. Thus P is uniquely determined by a center 
V from some non-base level I. An epoch creates only rh/n such centers, so we 
conclude that the size of is at most fh/n. □ 



Lemma 14. The large priority queues have 0{n^) changes per vertex update, 
hence a cost of 0{n^logn). 

Proof. From Lemma 10, we know that we only have O(n^) changes to paths gen- 
erated by the base. However, in the large local priority queue we also have 

paths P generated by a level / where either s or t are centers. By symmetry, we 
may assume that s is a level I center. Let v be the second to last vertex in P. Then 
P[s,w] is selected hence unique shortest in Gj, and (v,t) is an edge. Stepping 
back, we can characterize P by the vertex s, the graph G/, and the edge {v,t). 
The vertex s is one of the m/n vertices inserted during the epoch, and during the 
epoch, it becomes center of O (log (m/n)) level graphs G/. Consequently, the num- 
ber of such paths is 0((m/n)mlog(m/n)) = 0(m^/n log (m/n)) = 0(n^+°^^^) 
since m = □ 



Theorem 2. With the base graph and the split priority queues, we solve the 
fully- dynamic APSP problem supporting each vertex update in 0(n^(logn -I- 
log^(m/n))) amortized time and in 0(mn) space. Our algorithm runs on a 
comparison-addition based pointer machine. 

Proof. We consider the amortized cost of a vertex update. The space is already 
established in Lemma 12. From the analysis of Lemma 9, we know that it is 
only changes to level generated paths that can exceed O(n^logn). Their cost in 
the global priority queue is O(n^logn), and by Lemma 14, the cost of the large 
priority queues is 0(n^ log n). All other changes are to small priority queues. 
By Lemma 13, their priority queue cost is 0(log(m/n)). From the analysis of 
Lemma 9, we know that there are O(n^) changes to paths generated by each of 
the 0(log2(m/n)) levels. Thus our total cost per vertex update is 0(n^(logn -I- 
log^(m/n))). □ 

We note that if the weights are integers, or floating point numbers in standard 
representation, we can use the priority queue from [9], reducing the delete time 
to O(loglogn) while keeping the insert and decrease-key time constant. This 
improves our amortized time bound to 0{n^(loglogn-\-log{m/n) loglog(m/n))) 
per vertex update. 

5 Concluding Remarks 

We have reduced the amortized update time for the fully-dynamic APSP problem 
to 0(n^(logn-|- log^(m/n)). However, we use 0(mn) space and we would like 
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to reduce this to 0{n^) space. For many previous fully-dynamic APSP and 
transitive closure algorithms, the space was reduced from 0{mn) to 0{v?') in 
[7]. In particular, this gave 0{n?') space for the King’s APSP algorithm [6] which 
has an update time of with unit weights. Unfortunately, the reduction 

breaks down for the current approach with generated paths. 

Another interesting problem is to get a worst-case times for each individual 
update. All current approaches are lazy, and have worst-case update times as 
slow as a static algorithm. Using some of the ideas from this paper, the author 
has found better worst-case update times for the fully-dynamic APSP problem 
[ 10 ]. ^ ^ 

Finally, we mention the challenge of getting a fully-dynamic single source 
shortest path algorithm with sublinear query and update times. This is open even 
with amortized times bounds and in the case where we just want to maintain 
the shortest path from a fixed source s to a fixed destination t. 
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Abstract. We study the gossiping problem in directed ad-hoc radio net- 
works. Our main result is a deterministic algorithm that solves this prob- 
lem in an n-node network in time log^ n). The algorithm allows 

the labels (identifiers) of the nodes to be polynomially large in n, and 
is based on a novel way of using selective families. The previous best 
general (he., dependent only on n) deterministic upper bounds were 
log® n) for networks with polynomially large node labels [1], and 
log^ n) for networks with linearly large node labels [2,3,4]. 



1 Introduction 

The two classical problems of information dissemination in computer networks 
are broadcasting and gossiping. In the broadcasting problem, we want to dis- 
tribute a message from a distinguished source node to all other nodes in the 
network. In the gossiping problem, each node v in the network initially holds a 
message niy, and we want to distribute all messages my to all nodes in the net- 
work. For both problems, an important performance measure is the time needed 
to complete the required task. 

We consider the following model of a radio network. A network is a directed, 
strongly-connected graph G = {V, E), where V represents the set of nodes of the 
network, and E contains an (ordered) pair of distinct nodes (v,w) G V x V iff 
node V can directly send a message to node w. If (v,w) G E, then we say that 
w is a neighbour of v and v is an in-neighbour of w. The total number of the 
in-neighbours of a node w is its in-degree, and the maximum in-degree of a node 
is called the max-indegree of the network. The size of the network is the number 
of nodes n= jUj. Each node u G U is labelled by a distinct positive integer. The 
set of nodes directly reachable from a node w G U is the range of v. One of the 
radio network properties is that a message transmitted by a node is always sent 
to all nodes within its range. 

The communication in the network is synchronous and consists of a sequence 
of (communication) steps. During one step, each node v either transmits or lis- 
tens. If V transmits, then the transmitted message reaches each of its neighbours 
by the end of this step. However, a node w in the range of v successfully receives 
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this message iff in this step w is listening and v is the only transmitting node 
which has w in its range. If node w is in the range of a transmitting node but 
is not listening, or is in the range of more that one transmitting node, then a 
collision occurs and w does not retrieve any message in this step. Moreover, the 
“noise of collision” is indistinguishable from the “background noise” experienced 
by a node which is not in the range of any transmitting node (that is, the nodes 
do not have a collision detection mechanism). Dealing with collisions is one of 
the main challenges in efficient radio communication. A commonly used tool for 
coping with collisions is the concept of selective families of sets [5,6,2,!]. In 
Section 2 we recall this concept and introduce a novel, more powerful way of 
using it. 

The (communication) time of an algorithm is the number of communication 
steps required to complete the algorithm. That is, we do not account for any 
internal computation within individual nodes. Another abstraction of the model 
is no limit on the length of a message which one node can transmit in one step. 
Actually, to simplify the presentation of algorithms, we assume that if a node 
transmits in the current step, it transmits its whole knowledge. 

The algorithms we present in this paper are for ad-hoc radio networks: the 
topology of connections is unknown in advance. At the beginning of an algorithm 
each node knows only its label, its initial message, and the global bound N on 
the node labels. We assume that N = 0{n^) for some constant p. 



1.1 Previous Work 

The broadcasting problem has attracted considerably more attention than the 
gossiping problem. For networks with linearly bounded labels {N = 0(n)), the 
trivial 0{n^) upper bound on broadcasting was first improved by Chlebus et 
al. [7] to 0(n^^/®). The subsequent improvements included an time 

algorithm proposed by De Marco and Pelc [8], an time algorithm pro- 

posed by Chlebus et al. [5], and an O(nlog^n) time algorithm developed by 
Chrobak, Gqsieniec and Rytter [2]. dementi, Monti and Silvestri [6] presented 
a deterministic broadcasting algorithm for ad-hoc radio networks which works 
in time 0{DA), where D is the diameter of the network (the number of edges 
on the longest shortest path) and A is the maximum in-degree of a node. The 
O(nlog^n) and 0(DA) algorithms, presented, respectively, in [2] and [6], can 
be easily adapted to work within the same asymptotic times for polynomially 
bounded node labels. Brusci and Del Pinto [9] showed that for any deterministic 
algorithm A for broadcasting in ad-hoc radio networks, there are networks on 
which A requires C(nlogn) time. 

The first sub-quadratic deterministic algorithm for the gossiping problem in 
ad-hoc radio networks was the 0(n^/^) time algorithm proposed by Chrobak 
et al. [2]. Subsequently Xu [4] improved this bound by a polylogarithmic factor 
obtaining a bound. For small values of diameter D, the gossiping time 

^ Notation 0{f{n)) denotes a function in 0(/(n) log“n) for a constant c. In all cases 
when we use this notation in this paper, constant c is at most 4. 
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was later improved by Ggsieniec and Lingas [3] to 0{nD^/^). The gossiping algo- 
rithms presented in [2,4,3] assume that the node labels are linear in n and we do 
not see how they could be extended to the case where node labels are polynomi- 
ally large, dementi, Monti and Silvestri [6] presented a 0{DA’^)-tm\e determin- 
istic gossiping algorithm, and subsequently Gqsieniec and Lingas [3] showed an 
0{DA^/"^) algorithm. Both these algorithms work for polynomially large node 
labels. Prior to this paper, the best general (dependent only on n) bound on a 
deterministic algorithm for gossiping in ad-hoc networks with polynomially large 
node labels was due to Gqsieniec, Pagourtzis and Potapov [1]. 

A study on deterministic gossiping in unknown radio networks with mes- 
sages of limited size can be found in [10]. The gossiping problem in ad-hoc radio 
networks attracted also studies on efficient randomised algorithms. Ghrobak et 
al. [2] proposed an O(nlog^n) time randomise gossiping algorithm. This time 
was later reduced to O(nlog^n) in [11], and then to O(nlog^n) in [12]. This 
shows a relatively large gap between the best known deterministic and ran- 
domised algorithms for gossiping. 

We also mention some results for communication in the model where the 
network topology is known to all nodes in advance. Gaber and Mansour [13] 
showed that in such a model the broadcasting task can be completed in time 
0(11 -I- log® n). Diks et al. [14] proposed efficient radio broadcasting algorithms 
for (various) particular types of network topologies. The gossiping problem was 
not studied in the context of known radio networks until very recent work of 
Gqsieniec and Potapov [15]. One can find there a study on the gossiping prob- 
lem in known radio networks, where each node transmission is limited to unit 
messages. In this model they proposed several optimal and almost optimal 0(n)~ 
time gossiping algorithms for various standard network topologies, including 
lines, rings, stars and free trees. They also proved that there exists a radio net- 
work topology in which the gossiping (with unit messages) requires I7(n log n) 
time. 

1.2 Our Results 

In this paper we present a deterministic algorithm that solves the gossiping prob- 
lem in directed ad-hoc radio networks with polynomially large node labels in time 
0(rA/^ log^°^® n). This is the fastest currently known deterministic radio gossip- 
ing algorithm in graphs with an arbitrary topology. The previous best algorithm 
for this task requires 0(n®/®) time [1]. Our algorithm improves also the previous 
best upper bound 0(n®/^) for gossiping in ad-hoc networks with node labels 
only linearly large [2] . The algorithm is based on an extension of the concept of 
strongly selective families [5,6] from star-like sub-graphs into sub-graphs with a 
more general topology. We also show a simple 0(nZ\ log^ n)-time deterministic 
gossiping algorithm, which improves Gqsieniec and Lingas’ [3] upper bound of 
0(min{nll^/^, algorithm, if Z\ = and DA^/"^ = for 

some constant e > 0. 

The paper is organised as follows. In Section 2 we recall basic definitions on 
selectivity, selective families, and selectors. We also introduce a new notion of 
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a path selector which forms a crucial part in our main gossiping algorithm. In 
Section 3 we present our two algorithms. In Section 4 we briefly summarise the 
paper and mention some directions for further research in deterministic radio 
gossiping. 



2 Selectivity and Tools 

In this section we recall the mechanics of selectivity. In particular, we refer to the 
definition of selectors which form a backbone of a new combinatorial structure 
of an path selector, a tool that is used in our new radio gossiping algorithm. 

The neighbourhood of a node v is the set NB{v) consisting of node v and 
all its in-neighbours. For any simple path P =<Vq, Vi, . . . , Vq>, the length of P, 
denoted by length(P), is the number of edges on P. The neighbourhood of P is 
defined as the union NB{P) of the neighbourhoods of nodes v\, . . .Vq. Observe 
that node vq belongs to NB{P) since it belongs to the neighbourhood of node 
v\, but an in-neighbour of node uq does not belong to NB{P), unless it is also 
an in-neighbour of an other node on P. If (u, v) G E and u is the only node in 
NB(v) transmitting in the current step, then v receives this transmission. For 
path P, if Vi is a node on P other than the first node vq, {u,Vi) G E, and u is 
the only node in NB{P) transmitting in the current step, then Vi receives this 
transmission. 

2.1 Selectors 

We say that a set R hits a set Z on element z, ii RC\ Z = {z}, and a family P 
of sets hits a set Z on element z, if RC\ Z = {z} for at least one R G P. 

De Bonis et al. [16] introduced a definition of a family of subsets of set 
[N] = {0, 1, . . . , — 1} which hits each subset of [N] of size at most k on at 

least m distinct elements, where N, k and m are parameters, N > k > m > 1. 
They proved existence of such a family of size 0{{k^/{k — m + 1)) log IV) = 
0{{k^/{k — m + l))logn). For convenience of our presentation, we prefer the 
following slight modification of this definition, obtained by using the parameter 
r = k — m instead of the parameter m. For integers N and k, and a real number 
r such that N>k>r>0, a, family P of subsets of [N] is a {N,k,r)~ selector, 
if for any subset Z C [N] of size at most k, the number of all elements 2 : of Z, 
such that, P does not hit Z on z is at most r. That is, 

j{z G Z : for each R G P, RC\ Z ^ {z} }| < r. 

In terms of this definition, De Bonis et al. [16] showed existence of a {N, k, r)- 
selector of size T{N,k,r) = 0{{k‘^ /{r + 1)) log IV). In particular, there exists 
a (A^, fc, 0)-selector of size 0{k'^ log N) - such a “strong” selector hits each set 
Z C [N] of size at most k on each of its elements; and a {N, k, cfc)-selector of size 
O(klogN) = 0{klogn), for any constant 0 < c < 1 - such a “weak” selector 
guarantees only that it hits each set Z C [A^] of size at most k at least on a 
constant fraction of its elements. 
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2.2 Path Selectors 

An interleaved sequence S(Aq, Ai, . . . , Ap_i) of finite sets Ai is an infinite 
sequence (ao 7 cri)---) obtained by arbitrarily ordering the elements of each 
set Ai into a sequence (oi^O; Oi,i) • ■ • ) where qi = \Ai\, and setting 

ai+jp = aijynodqi, for all integers j > 0 and 0 < i < p — 1. That is, subse- 
quence («i, Oi+p, a.i+ 2 p , . . .) is a periodic sequence based on ordered elements of 
set Ai, for each i = 0,1, . . . ,p — 1. 

A path selector is an interleaved sequence of properly chosen selectors. Let 
IFi be an {N, k, fc/2*+^)-selector of size T{N, k, for i = 0, 1, . . . , [log fc] . 

Also let 

n jL 

Tw.fc = ([logfcl+l) ^ -T{N,k,—) 

i^O 

riogfc] 

= ([logfcl + l) ^ -0{TklogN) = 0{eiogNlog^k). 

i=0 

Definition 1. The prefix of an interleaved sequence S(iFo,iFi, . . . ,iFpogfe]) of 
length forms a path selector Sjq^k- 

Note that according to its definition the length of the path selector S'jv,fc is 

0 (fc2 log fV log^ fc) = O (fc2 log^ n) . 

2.3 Extended Selectivity 

In previous work the selectivity properties of selectors were used in the context 
of star-like sub-graphs, where a number (bounded by certain value k) of nodes 
is competing to communicate to one distinguished node c, the centre of the star. 
For example, a single application of a strong {N, k, 0)-selector allows each of the 
competing nodes to transmit successfully to the centre c in some step. To apply 
a family T of subsets of [N] means first to arrange the sets of iF in a sequence 
Fi, F 2 , . . . , F\j7\. Then in step i, the nodes with labels in Fi transmit, while the 
other nodes listen. In this paper we extend the notion of selectivity to graphs 
with larger eccentricity by proving the following lemma. 

Lemma 1. Let P =<Vq, ..,Vk> = c> be a directed simple path, s.t., |fVi?(P)| < 
k. A single application of path selector SN,k allows node vq to deliver its own 
message mo along path P to its endpoint c. 

Proof. Let Z{—1) be the set of the labels of the nodes in NB{P). For i = 
0, .., [log fc] , let Z{i) C Z{i — 1) be the set of the labels in Z{i — 1) that are not 
hit by selector Fi. Note that set Z{i — 1) — Z{i) contains all labels that are not 
hit by any Fj, for j = 0, .., i — 1, but are hit by Fi. 

According to the definition of selectors Fi, the cardinality of Z{i—1) is at most 
fc/2*, so \Z{i—l) — Z{i)\ < fc/2* too. Let Xi = {wq, ■■,Vk'-i}L\{Z{i—l) — Z{i)), for 

1 = 0,.., [logfc]. Then \Xi\ < fc/2* too. Observe that Z{\\ogk'\) is empty, since 
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selector hits every label in NB{P), so Ul=o 1) ~ -^(*)) = NB{P) 

and Ui=o^^ = {vo,..,Vk'-i}- 

Note that the copy of message mg starting fom node Vg and proceeding along 
path P will use selector Pi to progress from nodes in Xi, that is, at most fc/2* 
times. Thus the total time spent by message mg in all sets Xi (including time 
multiplexing used for selector interleaving) is: 

u h 

([logftl + l) ^ ) = 

i=0 

Corollary 1. Let P =<vq, ..,Vk>= c> he a directed simple path, s.t., \NB{P)\ < 
k. A single application of the path selector SN,k, which takes log^ n) times, 
allows all nodes in P to deliver their own messages along P to the endpoint c. 

We remark that prior to our paper, the best available upper bound on com- 
pleting the communication task referred to in the above corollary was the 
bound of k successive applications of a strong {N, k, 0)-selector. 

3 Faster Deterministic Gossiping 

The algorithms presented in this section use procedure Broadcast(u), which 
distributes from v all messages known to v (that is, the message originating at v 
and all messages received by v so far) to all other nodes in the network. We say 
that a message is secured if it has already been communicated to all nodes in the 
network by an application of the procedure Broadcast(u). Otherwise we say 
that the message is still active. A dormant node is a node whose original message 
is already secured. And an active node is a node which is not yet dormant. An 
active path is a simple path such that all nodes on this path other than the last 
one are active. The last node of an active path may be active or dormant. 

Our algorithms use a quasi-gossiping principle. The quasi-gossiping proce- 
dure guarantees that on its completion every not yet secured message is com- 
municated to at least one dormant node. Observe that full gossiping can be 
completed by an application of a quasi-gossiping procedure followed by further 
execution of all transmissions in this quasi-gossiping procedure in exactly the 
same order as in the first run. 

Another important component in our algorithms is the following procedure 
DiSPERSE(a:), which is mainly responsible for distribution of large enough (con- 
taining at least x) combined active messages. 

DiSPERSE(a:): 

repeat 

select a node v which has at least x active messages; 
if such a node v exists 
then Broadcast(u); 
else return. 
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Lemma 2. On the completion of procedure DiSPERSE(a;), each node contains 
less than x active messages. If an algorithm executes procedure DiSPERSE(a;) r 
times, then the total running time of these executions is 0{{{n/x) + r)nlog^n). 

Proof. The first part of the Lemma is immediate. 

The time complexity of the procedure Broadcast() is bounded by 
0(n log^ n) [2]. Selection of a node containing at least x active messages is done 
in time O(nlog^n) by binary search combined with the broadcast procedure; 
for details see [2]. Thus each iteration takes O(nlog^n) time. For each call to 
procedure DiSPERSE(a;), each iteration other than the last one secures at least 
X active messages, so the total number of iterations, over all calls, is at most 
(n/x) + r. 



3.1 Gossiping in Time 0{nA) 

We assume here that max-indegree of the network is bounded by A. The gossip- 
ing algorithm works in three phases reflecting the principle of quasi-gossiping. 
Phase I is based on application of a strong (A^, Z\, 0)-selector k = |"(nlogn)/Z\] 
times. Phase II is a single application of procedure DiSPERSE(fc). Phase III re- 
peats all transmissions from Phases I and II. 

GossiPl(n, Z\): 

Let k = \{n log n) /Z\] ; 

Phase I (move all messages along paths of length fc): 
supply (-^) 0)-selector k times; 

Phase II (make at least every fc-th node of an active path dormant): 
Disperse(A:); 

Phase III: repeat all transmissions from Phases I and II. 



Theorem 1. The algorithm GossiPl(n, A) performs radio gossiping in any ad- 
hoc network of size n and max-indegree at most A in time 0{nAlog^ n). 

Proof. Recall that the size of (iV, A, 0)-selector is 0{A'^ log n). Thus the running 
time of Phase I is A: • 0(Z\^logn) = 0(nZ\ log^n). Lemma 2 implies that the 
running time of Phase II is 0{{n/ k)n log^ n) = 0{nAlog^ n). Hence the total 
running time is 0(nAlog^ n). It remains to prove that the algorithm always com- 
pletes gossiping. We prove this by showing that phases I and II always complete 
quasi-gossiping. 

If there is no simple path of length k coming out of a node v, then on the 
completion of Phase I, the message is known to all other nodes in the network. 
If there is a simple path P coming out of v of length k, then at the end of Phase 
I, my is known to all nodes on P. Note that on the completion of Phase II, at 
least one node on P is no longer active. Otherwise the last node on P would 
contain at least q active messages, which is not possible after execution of the 
procedure Disperse(A:). Thus must be known to at least one dormant node. 
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3.2 Gossiping in Time 0(n^/^) 

In this section we present our main deterministic radio gossiping algorithm, 
which works in graphs with an arbitrary topology. The framework of the al- 
gorithm is to keep transmitting active messages so that individual nodes keep 
accumulate more and more “local” messages, and to apply periodically proce- 
dure DiSPERSE(a:) with appropriately chosen values of the threshold parameter 
X. The control of the “local” accumulation of active messages is initially done 
using weak selectors but at some point switches to path selectors. 

The algorithm consists of three phases and follows the quasi-gossiping prin- 
ciple: Phases I and II complete quasi-gossiping, while Phase III is an exact rep- 
etition of all transmissions done in Phases I and II. Phase I is the same as the 
initial phase of the gossiping algorithm proposed by Gqsieniec et al. in [1]. Re- 
peatedly apply a weak {N, q, <7/4)-selector followed by procedure DiSPERSE(g/4), 
for q geometrically decreasing from n to k. The value of the parameter k will be 
set later. At the end of the t-th iteration, the size of the active neighbourhood 
of each node (that is, the number of active nodes in the neighbourhood of each 
node) is less than n/2*, and at the end of the last iteration, the size of the active 
neighbourhood of each node is less than k. In Phase II we iterate a logarithmic 
number of times the path selector followed by procedure Disperse(A:/2), 
to reduce the active neighbourhoods of active paths. We show below (Lemma 6) 
that this computation results in delivery of all active messages to dormant nodes. 
The pseudo-code of the gossiping algorithm follows. From now one, the neigh- 
bourhood of a node or path refers to the active neighbourhood. 

GossiP2(n): 

Phase I (reduction of neighbourhoods of nodes): 

q^n, 

while <7 > fc do 

the active nodes transmit according to a (A^, g, <7/4)-selector, 
Disperse((7/4), 
q ^ q/2; 

Phase II (reduction of neighbourhoods of active paths): 
repeat [log fc] -I- 1 times: 

the active nodes transmit according to the path selector SN,k, 
DlSPERSE(fc/2); 

Phase III, repeat all communication from Phases I and II. 



Lemma 3. At the end of Phase I, the size of the neighbourhood of each node is 
less than k. 

Proof. A simple inductive argument using the definition of a selector shows 
that at the beginning of each iteration of the loop in Phase I, the size of the 
neighbourhood of each node is less than q. 
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Lemma 4. In Phase II, if at the beginning of an iteration the size of the neigh- 
bourhood of each active path of length at most I is less than k, then at the end of 
this iteration the size of the neighbourhood of each active path of length at most 
21 is less than k. 

Proof. Consider an iteration of the loop in Phase II and assume that at the 
beginning of this iteration the size of the neighbourhood of each active path of 
length at most I is less than k. Let P be a path of length at most 21 which is 
active at the end of this iteration. 

If length{P) < I, then the size of the neighbourhood of P is already less than 
k at the beginning of the iteration, and it can only decrease further. 

If length{P) > I, then let Pi and P 2 be paths of length at most I each whose 
concatenation is path P. The size of the neighbourhood of path Pi, i = 1,2, at 
the beginning of the iteration is less than k. Therefore Corollary 1 and Lemma 2 
imply that less than k/2 active nodes are left in the neighbourhood of Pi at the 
end of the iteration. The neighbourhood of P is the union of the neighbourhoods 
of Pi and P 2 , so its size at the end of the iteration must be less than k. 



Lemma 5. In Phase II, at the beginning of the last iteration, the size of the 
neighbourhood of each active path is less than k. 

Proof. Lemma 3 implies that at the beginning of Phase II, the size of the neigh- 
bourhood of each active path of length 1 is less than k. This fact and Lemma 4 
imply that at the end of iteration i, the size of the neighbourhood of each active 
path of length at most 2* is less than k. Thus at the end of iteration [log k~\ , each 
active path of length at most k has neighbourhood of size less than k. We cannot 
have an active path of length greater than k at the end of this iteration, because 
a subpath of length k of such a path would have neighbourhood of size at least 
k (the size of the neighbourhood of a path cannot be less than the length of this 
path) . Hence at the beginning of the last iteration, the size of the neighbourhood 
of each active path is less than k. 



Lemma 6. At the end of Phase II, either the full gossiping is already completed 
or each active message is in a dormant node. 

Proof. If there is at least one dormant node in the network at the beginning of 
the last iteration in Phase II, then Lemma 5 implies that at this point of the 
computation, for each active node v and each active path from r; to a dormant 
node u, the size of the neighbourhood of this path is less than k. Lemma 1 
implies that the last iteration in Phase II sends the message from v to u. 

If there is no dormant node in the network at the beginning of the last 
iteration in Phase II, then Lemmas 5 and I imply that the last iteration in 
Phase II sends the message from each node to all other nodes, completing the 
full gossiping. (Actually, one can show that in this case the full gossiping is 
completed already by the end of the first iteration in Phase II.) 
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Theorem 2. The algorithm GossiP2(n) performs radio gossiping in ad-hoc net- 
works of size n and arbitrary topology in time n). 

Proof. Since Phases I and II complete quasi-gossiping (Lemma 6), then the whole 
algorithm completes gossiping. 

The running time of Phase I is O(nlogn) for all applications of weak se- 
lectors, plus 0((n^/A:) log^ n) for all applications of procedure Disperse (see 
Lemma 2). The running time of Phase II is 0{k^ log^ nlog k) time for all appli- 
cations of the path selector, plus 0{{n/k-\-log k)nlog^ n) time for all applications 
of procedure Disperse (see Lemma 2). Thus the total running time of the al- 
gorithm is Oifrif jk P fc^logfc -I- nlog fc) log^ n), which is 0(n^/^ log^°^^ n) for 
k = (n^/^)/log^^^ n. 

4 Conclusion 

In Section 3 we presented two new radio gossiping algorithms. The algorithm 
GossiPl(n, Z\) is designed for graphs with max-indegree bounded by A. With 
the running time 0(nZ\log^n), this algorithm performs best when the diame- 
ter of the network is large (close to n) and the max-degree is relatively small 
(o(n^/^)). The algorithm GossiP2(n) is designed for graphs with an arbitrary 
topology. With the running time 0(n^^^ log^*^^^ n), this algorithm is currently 
the best (up to our knowledge) known deterministic radio gossiping algorithm 
in this case. 

An obvious open problem is to close further the gap between the best cur- 
rently known randomised 0(nlog^ n)— time gossiping, given in [12], and our new 
deterministic 0(71"^/^ log^°^^ n)— time gossiping procedure. It seems that to im- 
prove the deterministic upper bound one would need to introduce new, more 
adaptive gossiping paradigms. An implication of our main algorithm is that now 
the upper bounds for deterministic gossiping is the same for polynomially large 
node labels as for linearly large labels. One might gain some further insight into 
the time complexity of the gossiping problem by looking for the cases when the 
linear node labels enable faster algorithms than the polynomially large labels. 
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Abstract. We consider online scheduling of splittable tasks on parallel machines. 
In our model, each task can be split into a limited number of parts, that can then 
be scheduled independently. We consider both the case where the machines are 
identical and the case where some subset of the machines have a (fixed) higher 
speed than the others. We design a class of algorithms which allows us to give tight 
bounds for a large class of cases where tasks may be split into relatively many 
parts. For identical machines we also improve upon the natural greedy algorithm 
in other classes of cases. 



1 Introduction 

In this paper, we consider the problem of distributing tasks on parallel machines, where 
tasks can be split in a limited amount of parts. A possible application of the splittable 
tasks problem exists in peer-to-peer networks [5]. In such networks large files are typi- 
cally split and the parts are downloaded simultaneously from different locations, which 
improves the quality of service (QoS). More generally, computer systems often dis- 
tribute computation between several processors. This allows the distributed system to 
speed up the execution of tasks. Naively it should seem that the fastest way to run a 
process would be to let all processors participate in the execution of a single process. 
However in practice this is impossible. Set-up costs and communication delays limit the 
amount of parallelism possible. Moreover, some processes may have limited parallelism 
by nature. In many cases, the best that can be done is that a process may be decomposed 
into a limited number of pieces each of which must be run independently on a single 
machine. 

The definition of the model is as follows. In the sequel, we call the tasks “jobs" as 
is done in the standard terminology. We consider online scheduling of splittable jobs on 
m parallel machines. A sequence of jobs is to be scheduled on a set of machines. Unlike 
the basic model which assumes that each job can be executed on one machine (chosen 
by the algorithm), for splittable jobs, the required processing time pj of a job j may be 
split in an arbitrary way into (at most) a given number of parts £. Those parts become 
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independent and may run in parallel or at different times on different processors. After 
a decision (on the way a job is split) has been made, the scheduler is confronted by the 
basic scheduling problem, where each piece of job is to be assigned non-preemptively 
to one machine. In the on-line version, jobs are presented to the algorithm in a list, this 
means that each job must be assigned before the next job is revealed. Only after the 
process of job splitting and assignment is completed, the next job is presented to the 
algorithm. The goal is to minimize the makespan which is the last completion time of 
any part of job. 

We consider two machine models. The first one is the well known model of identical 
machines, where all machines have the same speed (w.l.o.g. speed 1). The second case 
relates to systems where several processors are faster (by some multiplicative factor) 
than the others. In this case let s be the speed of the fast processors. The other processors 
have speed 1 . This also contains the model where one processor is fast and all others are 
identical [15,8,3,14]. We call the machines of speed s fast, and all other machines are 
regular machines. The number of fast machines is denoted by / whereas the number 
of regular machines is m — /. The processing time of job j on a machine of speed s is 
Pj / s. Each machine can process only one job (or part of job) at a time, and therefore 
the completion time of the machine is the total processing time of all jobs assigned to it 
(normalized by the speed), which is also called the load of the machine. In the context 
of downloading files in a peer-to-peer network, the speeds correspond to the bandwidths 
for the different connections. 

We use competitive analysis and given a problem we would like to determine its 
competitive ratio. The competitive ratio of an algorithm is the worst case ratio between 
the makespan of the schedule produced by the algorithm, and the makespan of an optimal 
offline algorithm which receives all input tasks as a set and not one by one. We denote the 
cost of this optimal offline algorithm by opt. The competitive ratio of a problem is the 
best possible competitive ratio that can be achieved by a deterministic on-line algorithm. 

Previous work: The basic model (with £ = s = 1) was studied in a sequence of papers, 
each improving either the upper bound or the lower bound on the competitive ratio [10, 
7,2,12,1,9,6,1 1]. The offline splittable jobs problem was studied by Shachnai and Tamir 
[19]. They showed that the problem is NP-hard (already for identical machines) and 
gave a PTAS for uniformly related machines. The problem was also studied by Krysta, 
Sanders and Vocking [13] who gave an exact algorithm which has polynomial running 
time for any constant number of uniformly related machines. A different model that is 
related to our model is scheduling of parallel jobs. In this case, a job has several identical 
parts that must run simultaneously on a given number of processors [4,16]. 

Our results: We first analyze a simple greedy-type algorithm that splits jobs into at most 
£ parts, while assigning them in a way that the resulting makespan is as small as possible. 
We then introduce a type of algorithm that always maintains a subset ofk<£ machines 
with maximal load (while maintaining a given competitive ratio), and show that it is 
optimal as long as £ is sufficiently large in relation to m + /. The case f = m — I 
is treated separately. For smaller £, we give an algorithm for identical machines that 
uniformly improves upon our greedy algorithm. Finally, we consider the special case of 
four identical machines and £ = 2, which is the smallest case for which we did not find 
an optimal solution. The algorithms assume that it is always possible to compute the 
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value of OPT for a subsequence of jobs which already arrived. In section 3 we explain 
how to compute this value. 



2 A Greedy Algorithm 



In this section, we analyze a simple greedy-type algorithm that works as follows. Recall 
that we consider the case where there is a group of / machines of speed s > 1, and 
the remaining m — f machines have speed 1. For each arriving job, the algorithm finds 
the way to schedule it on at most £ machines, in a way that the resulting makespan is 
as small as possible. This is done by assigning the job to a subset of least loaded fast 
machines and a subset of least loaded regular machines. To implement this algorithm, 
we need to consider the combination of the least loaded x regular machines with the 
least loaded y fast machines, for all feasible cases: x + y<£,0<x< min{f, m — f} 
and 0 < y < min{f, /}. There are only such combinations. If the job is split into 

less than £ parts, it means that the makespan did not change. Note that for f = s = 1, 
this algorithm reduces to the standard greedy algorithm for load balancing. 

Consider an arbitrary subset S' of f machines, and denote the number of fast machines 
in this subset by g. Consider the time where the maximum load is achieved hrst. This 
happened after assigning a job on exactly £ machines. Denote the total processing time 
scheduled on the i-th machine in subset S by Wf {i = !,...,£). Let x be the job that 
achieves the maximum load (and by a slight abuse of notation, also its processing time is 
denoted by x). Let W = Wi, i.e. the total processing time of all jobs right before 

the assignment of x. Let Greedy denote the makespan of the greedy algorithm. By our 
assignment, we have for any subset S 

Greedy < — ^ Ug + £ - o)Greedy < kCf + . . . + Wf + x. 

sg + £-g ^ 1 



There are such subsets, and each machine occurs in (7_i^) of them. Sum- 
ming the above inequality over all subsets, we have that each time a fast ma- 
chine occurs, it contributes s to the left hand side; a regular machine contributes 1. 

Thus Greedy • (s(7-”i')/ + ~ /)) < + ...W^) + (7)x or 

(s/ + m — /(Greedy < W + x'f- Furthermore, we have opt > ^^777/ ■ If / > 
we also have opt > otherwise opt > Thus if / > f 



Greedy < 



and otherwise 
Greedy < opt 



W + x'^ s£-opt(2/-1) 

— < OPT H 

sf + m- f sf + m- f 

{sf + £-f)ovi{^-l) 



< 1 



.-£ 



sf + m- f 



= 1 + 



sf + m- f 

sf + i-f (m 



S OPT. 



sf -\-m- f 



(f-O 



OPT. 



These ratios are decreasing in £ and are 1 for f = m. For / = 0 (or equivalently s = 1) 
the second ratio applies, which then becomes 2 — f/m. For larger /, the ratio is lower. 
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3 Computing opt 

Throughout the paper we assume that the value of opt is known to the on-line algorithm. 
There are several options to achieve this knowledge. The algorithm of Krysta, Sanders 
and Vocking [13] can solve an offline problem exactly using time which is polynomial 
seeing the number of machines as constant. The drawback is that their algorithm must be 
exercised after every arrival of a job to find out the new value of opt. Another and better 
option is simply to use the two following lower bounds on opt: the sum of processing 
times of all jobs divided by the sum of speeds, and the size of the largest job divided by 
the sum of speeds of the i fastest machines. We already used these bounds in Section 2. 

All the proofs of upper bounds use only these bounds on opt, and therefore the 
knowledge of the actual values of opt is not required. Naturally, those bounds are not 
always tight as the offline problem is NP-complete already for identical machines and 
any constant ^ [19]. Note that in almost all cases in this paper where we got tight bounds 
on the competitive ratio, the value of opt is actually given by the maximum of the two 
bounds on opt. This is always true for £ > (to + 1) /2. In these cases an optimal offline 
schedule (not only its cost) can be computed by the following algorithm. This algorithm 
works for the general case of uniformly related machines (where each machine i has 
some speed Si). 

Algorithm Calculate the value of opt. We say that a job^to on a subset of machines if 
it can be placed there without any machine exceeding a load of opt (normalized by the 
speed). Sort the machines by nondecreasing speeds. 

Consider the largest job J. Clearly it fits on the ^ fastest machines. Let i be an index 
such that J fits onmachinesi, . . . ,z+£—l, where all these machines except possibly the 
last are used completely. If there is such an i, assign J there. We are left with machines 
l,z + £, ...,TO and possibly a part of machine z + C — 1. This is a subset of at 
most ^ machines, since £ > (to + 1) /2. Hence the remaining jobs can be split perfectly 
among these machines. Since the other machines are filled completely, they must all fit. 

If there is no such index z, then J fits on machines 1 , . . . , ^ — 1 or less machines (note 
that these are the slowest machines). This implies that all jobs fit on at most ^ machines: 
we need to add 1 to the number of machines used for one job since for later jobs we get 
that both the first machine of the job and the last one can be occupied partially by other 
jobs. Hence all jobs can be assigned without wasting any space. 

4 Algorithm High {k, TV) 

An important algorithm that we work with is the following, called HiGH(fc, R) . It main- 
tains the invariant that there are at least k regular machines with load exactly TZ times 
the optimal load, where TZ is the competitive ratio that we want to prove. Clearly such 
an invariant can only be maintained for k at most equal to£—l (consider the assignment 
of the first job), and in certain cases k has to be chosen even lower than that to get the 
best ratio. We will use this algorithm in the context of identical machines and in the 
case where there are several fast machines of speed s. Recall that the identical machines 
case is a special case of the second case (with s = 1). We immediately present the more 
general algorithm. 
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On arrival of a job J of size x, High(A:, TV) assigns the job to at most ^ machines 
such that the invariant is kept. We denote the optimal makespan before the arrival of J 
by OPTi, and after the arrival of J by 0 PT 2 . We would like to sort the machines by the 
capacity of jobs they can accommodate. For a machine i, let be its load and s' be 
its speed (s' = 1 or s' = s). Let bi be the gap on machine i, which is the maximum 
load that can be placed on the machine in this step. That is, bi = s'{TZ ■ 0 PT 2 — Li) 
for i = 1, ... ,m. We first sort only the regular machines in non-increasing order by 
their gaps. Clearly, the machines which had load T^opti have the smallest gap. We get 

V. • • • V ^m—f nnd bj^i — ^ — ... — — 7^opt2 T^opT]^. 

Let Si = bi + . . . + bi+k-i fori < i < m — f — k+l . This is the sum of the gaps on 
k consecutive regular machines. The algorithm can work only under the condition that 
Sm-f-k+i < x: if X is smaller, then after assigning x there are less than k machines 
with load 7 ^opt 2. We distinguish two cases. 

Case 1: Si > x. We can find a value i such that Si > x and < x. If Si = x, we 
can clearly assign J such that there are k regular machines with load 7 ^opt2. 

Suppose Si > X. Then i < m — f — k since Sm-f-k+i < x. We use the machines 
z, . . . , i+fc. This is a set of fc+1 machines. We add bj to machine j for j = i+1, . . . ,i+k 
and put the nonzero remainder on machine i. The remainder fits there since the job can 
fit on machines i, . . . ,i + k — 1 even without machine i + k. Clearly we get at least k 
regular machines with load TZopt 2 - The assignment is feasible since £ > k + 1. 

Case 2: Si < x. Here we introduce another condition which is the following. Consider 
the k regular machines with the largest gaps, and among the machines that are not the k 
regular machines with smallest gap, choose another £ — k machines with largest gaps. 
The condition for the algorithm to succeed is that the sum of these £ gaps is at least the 
size X. The assignment of x first fills the gaps on the k least loaded regular machines, 
and the non-zero remainder is spread between the £ — k machines with largest gaps. 

We use this algorithm several times in this paper. Each time, to show that it maintains 
some competitive ratio TZ, we will show the following two properties. 

LA new job is never too large to be placed as described. That is, if we place it on the 
£ machines, k of which are the regular machines with largest gaps, and the other £ — k 
are the machines with the largest gaps among the others (excluding the regular machines 
that have maximum load before), then afterwards the load on these machines is at most 
7^0PT2. 

2. A new job is never too small for the invariant to be maintained. I.e. if we assign 
the job on the k machines that had load T^opti, then it fits exactly in the gaps, or there 
is a remainder. This will show that in all cases we can make at least k machines have 
load 7 ^opt2. 

Note that for each arriving job, the new value of opt can be computed in time 
0(1), and the worst step in algorithm HiGH(fc, TZ) with regard to the time complexity is 
maintaining the sorted order of the regular machines, which can be done efficiently. 
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4.1 Many Splits 

We consider the case i > (to+ /) /2 (since k < £—1, if f = Owe need^ > (m-l- 1)/2). 
Note that this leaves open the case of / = m — 1. This case will he considered separately 
in the next subsection. 

We need some definitions in order to state the next Lemma. Let I' be the sum of 
speeds of the I fastest machines and let m' be the sum of all speeds. Clearly £ > f and 
so £' = sf + £ — f and m' = sf + m — f. Let c = £' I'm' and 



Note that 7^i(c) = 7^i(l — c). Finally, let ci be the real solution to + 2c — 1 = 0 

(c « 0.56984). 



Lemma 1. For c > ci, algorithm High(to — £, R\ (c) ) maintains a competitive ratio of 
7^i(c). 



Proof. Let k = m — £<£ — 1. We first show that the new job is never too large to be 
placed as described. If it is put on the £ machines which are all machines that did not have 
maximum load before the arrival of J, then the other k = m — £ regular machines have 
load 7 ?.i(c)opti because of the invariant (they were the machines with highest load). 
Thus we need to show that £'TZi{c)ovi 2 + kTZi{c)ovTi >W + x where W is the total 
load of all the jobs before J arrived. 

We have opti > W/m', 0 PT 2 > {W + x) jm' and 0 PT 2 > x/P . Therefore 



0PT2 > a ^ h (1 

to ' 




for any 0 < a < 1 



(1) 



Takinga = f'/w', wegetfcoPTi-|-f'oPT 2 > kWlvn!F£'a(yVFx')lm!F£'{f—OL)xl£' = 
{W + x){a£' jm! -|- 1 — a) = {W + x)(l — £' /m! + C'^/to'^) = as needed. 

Second, we show that J is always large enough such that we can again make k regular 
machines have load 7 ^i(c)opt 2. That is, x > kTZi{c){oPT 2 — opti). There are three 
possibilities for 0 PT 2 : it is either x/£', (W + x)jm' or y/£', where y is the processing 
time of some old job. 

If 0 PT 2 = y/£' we are done, since then opti = yjP as well. Otherwise, we use that 
OPTi > W jm' . Thus 0 PT 2 — OPTi < m.ax.{x j £' , x j m!) = x/£' . We need to show that 
kTZi{c)x/£' < X or kTZi{c) < £' . This holds if c^ — + 2c — 1 >0, which holds for 

c > Cl. This completes the proof of the upper bound of High(to — £, TZi{c)). □ 



Lemma 2. No algorithm for the scheduling of £-splittable jobs on a system of f fast 
machines of speed s and m — f regular machines has a better competitive ratio than 
7^i(c). 

Proof The values to' and £' are defined as above. Thus to' = sf + m — f. Furfhermore, 
P is fhe sum of speeds of the £ fastest machines, so P = sf F £— f if £^ f , P = s£ 
otherwise. The lower bound consists of very small jobs of total size m' = sf + m — f, 
followed by a single job of size FF — to', where W will be determined later. The 
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optimal offline makespan after the small jobs is opti = 1, and after the large job it is 
0 PT 2 = Wjm' . 

Consider an online algorithm A. After the small jobs have arrived, the algorithm 
“knows" it has to keep room for another single job. Therefore it can load the m — I 
machines it is not going to use for that job with the maximum load T^opti (if it puts 
more on some machine, the final job does not arrive). There are many cases according to 
how many fast machines it loads. Let ki be the number of fully loaded regular machines 
and k 2 = m — £ — ki the number of fully loaded fast machines. 

If A maintains a competitive ratio of TZ, we must have that W < TZopti (fci + sk 2 ) + 
7^0PT2((m — f — ki) + s{f — ^ 2 ))- This implies 



n > — 

m — 



W 

k2 + Sk2 + OPT2(fc2 + £ — f + Sf — Sk2) 



( 2 ) 



We can see that this number is minimized by minimizing k 2 , since the coefficient of k 2 
in the denominator is ( 0 PT 2 — 1)(1 — s) < 0. Therefore the lower bound is obtained 
by taking /c 2 = 0 if £ > /, and k 2 = f — £ otherwise. We choose W such that 
W — m' = — £'). Werewrite(2)togetIL < (m' — £')7^0PTi+f"7^0PT2.Then 

since opti = 1 and since from W = {m'Y I {m! — £') follows 0 PT 2 = m' l{m' — £'), 
we get 7?. > = ^i(c)- □ 

These two lemmas imply the following theorem. 



Theorem 1. For £' Im! > ci and £> ^ + \ max(/, 1) (i.e. f ^ m—1), the algorithm 
HiGH(m — £, TZi{c)) is well-defined and optimal. 



4.2 The Case of / = m — 1 Fast Machines 

For completeness, in this section we consider the case f = m—1. We give tight bounds 
for many cases, including the case of m — 1 parts, i.e. each job may run on all machines 
but one. Clearly we already solved the cases f = 0, ... ,m — 2 and f = m (this is the 
same case as / = 0) for large enough £. The solution of the case / = m — 1 is very 
different from the other cases. First the algorithm is not the same for all values of s. For 
small s, for the first time we use an invariant on the fast machines. For large s, for the 
first time we do not use all the machines. Again we use m! as the sum of all speeds, i.e. 
m! = {m — l)s + 1, and £' as the sum of speeds of the £ fastest machines, i.e. £' = s£. 
We introduce a new notation k' which is the sum of speeds of the machines that are kept 
at maximum load. This value is determined by the algorithm. 

For large s, we use an algorithm which never uses the regular machine. For the case 
^ = m — 1 it is a simple greedy algorithm that splits each job in a way that it keeps 
the load balanced on all fast machines. This gives the algorithm the ratio 1 + 

(easily proved by area considerations). For £ < m — \ the algorithm ignores the regular 
machine, and uses High(to — 1 — £, TZ 2 i) on m — 1 fast machines only, where 7?,2i is 
defined as a function of m! and £' (which are functions of m, £ and s): 

^ ^ ^ {m'f 

{m'Y — {m' — £'){£' + 1) {m'Y — m' — k'£' 
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We have k' = sk = s{m — £ — 1). The algorithm keeps k = m — £ — 1 fast machines 
with load 7^2 iOpt. Since k must be smaller than £, we require £ > ml2. 

On arrival of a job, let OPXi and 0PT2 be the optimal offline makespan before and 
after the arrival of the new job, respectively. The algorithm is the same as before but the 
properties are slightly different. We need to show that the following two properties hold: 

1 . X > k'TZ{0PT2 — OPTi). 

2. The gaps on the £ least loaded fast machines can contain x. 

The second property can be reformulated as 

f'7^0PT2 + /c'T^OPTi >W + X 



where W is the total processing time of jobs which arrived before the job of processing 
time X. This follows from £ + k = m — 1. Regarding the first property, similarly to 
before, we can bound the difference of the optimal offline costs by 0 PT 2 — opti < x/£' . 
This gives the condition 7?.2i < £' /k' . 

To show the second property we again use the bounds opti > ^ and (1). We need 
to show 



k'W 

m' 



W + X ,, , X 

a ^ h (1 — a) 



m' 



£' 



> 



W + x 

n 



7 1 

Taking 1 — a = we get that this condition is satished for 7^ = i? 2 i. 

For small s, we use a variation on previous algorithms. The algorithm keeps k = m—£ 
fast machines with load T^opt, where 



T^22 



t/2 



— {m' + s — 1 — £')£' 



{m'Y 

(m')^ — k'£' ' 



( 3 ) 



The value we use for k' is k' = s{m — l). The algorithm is dehned as High(to — f, 7?.22), 
except that the roles of the fast machines and the regular machine have been reversed. In 
other words, we use the gaps on fast machines to fit the job, and if it needs more room 
we use at most m — k — 1 fast machines and the regular machine as well. 

On arrival of a job, let OPXi and 0 PX 2 be the optimal offline makespan before and 
after the arrival of the new job, respectively. We again need the following two properties 
to hold: 

1. X > /c'7^(oPX2 — OPXi). 

2. The gaps on the m — k other machines (that do not maintain the invariant) can 
contain x. 

(m' - k')TZom 2 + fc'T^opxi > VF -I- x. 

The first property again translates into 0.22 < £' /k' . To show the second property we 
again use the bounds OPXi > ^ and (1). We need to show 



k'W 

m' 



+ {m' 



k') ( a 



FF + x 



m' 



(1 




^ W + x 

- n 



Taking 1 — a = , we get that this condition is satisfied for TZ = R 22 ■ 

We now give a lower bound that proves that these bounds are tight. The lower bound 
is actually more general, and holds for all values of £ and s. 
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Lemma 3. For f = m — 1, any online algorithm has a competitive ratio of at least 
min(7^2l,7^22)■ 

Proof. We define a sequence of jobs with the following processing times: Pi = 1, Pj = 
Pi- Let OPTj be the optimal offline cost on the subsequence of the hrst j 
jobs. Then we see that for j > 3 we have 



OPTi = 

^ m' 



^ n Pj 



and 



Pj = 



m' — I' 



P.i-1- 



Consider the behavior of the on-line algorithm starting from the third job. 

If the algorithm never splits a job using the regular machine, we need to consider 
two cases. If ^ = m — 1, the competitive ratio tends to the ratio 1 + of the 

greedy algorithm that does not use the regular machine. The second case £ < m — 2 is 
slightly more difhcult. Only the hrst two jobs might be scheduled on the regular machine. 
Consider job Pj . If A maintains a competitive ratio of TZ until this point, then on each 
of the fast machines that it does not use for job j it has placed at most sPopXj_i, and 
we hnd 

Pi - (m- £- l)sPoPT,_i ^ 

which implies that 7l(£'oPTj + s(m — £ — l)opXj_i) + Pi + P 2 > X)i=i Pi- We use 
J2i=i Pi = Pj + Pi — Lj (1 + ) = ^Pj to rewrite this condition in terms 

of Pj , and divide by Pj . For large enough j we can neglect Pi and P 2 and hnd 

/ s{m-£ - 1)(to' -f) \ 

V m'P ) - £' ' 



This gives 'R,>'R- 2 i- 

Otherwise (some job uses the regular machine), let j be the index of the hrst job for 
which a part is assigned to the regular machine. If A maintains a competitive ratio of TZ 
until this point, then on the machines that it does not use for job j (which are all fast) it 
has placed at most sPopXj_i, and we hnd 



ZLr Pi- s{m- £)PopXj_i 
s(^-l) + l 



< TZovij 



which implies that TZ{ovij{s{£ — 1) + 1) + s{m — £)ovTj_i) > Pi- We use 

Pi = y y to rewrite this condition in terms of Pj, and divide by Pj to hnd 

f s£ — s+1 s{m — £) ( to ' — £')\ ^ to ' 

V p j “ y 



which leads to P > T^22- 



□ 



We summarize our results in the following Theorem. 
Let Si = (to — 1 + s/ttP — 2m + 1 + 4£)/{2£). 
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Theorem 2. For the case ofm — Ifast machines of speed s. If s > s\, and (ml2 < i < 
m — 2 and 7^21 < I' j(m' — — 1)) or £ = m — 1, then the optimal competitive ratio 

of any online algorithm is IZ 2 i- If s < s\, £ > mj2 and IZ 22 < £' j {"ni' — £' + s — 1), 
then the optimal competitive ratio of any online algorithm is 7^22- 

Corollary 1. For f = £ = m — 1, the optimal competitive ratio is min(i? 2 i, 7?-22)- 

Proof. For small s, if £ = m — 1 then the value of Ti. 2 i is defined properly to be 
1 + > attained by the greedy algorithm that only uses fast machines. This ratio is 

thus tight. 

For large s, if £ = to — 1 then the first property to be checked leads to the condition 
s7?.(opt2 — OPTi) < X. Similarly to before, we can bound the difference of the optimal 
offline costs by 0 PT 2 — opti < x/{sm — s). Using (3), this leads to the condition 
s^{m — 1)^ < (to — 2)(sto — s + 1)^. This is true since s{m — 1) < sm — s -F 1 and 
TO > 3. Thus the condition on the ratio in Theorem 2 is satisfied as well as the condition 
on £. □ 

4.3 Few Splits on Identical Machines 

Following Theorem 1, we now consider the case c < Ci. Let 

^ 3 (c) = ^ - c + 2 - (c- l)\/c2 + 4^ . 

We examine algorithm High(£/7^3(c), 7^3(c)), i.e. k = £jIZ 3 {c), and verify that it 
maintains a competitive ratio of IZ^i^c). The second condition is immediately satisfied, 
since the only relevant case is OPT 2 —OPT 1 < x/£, which leads to the constraint fci? 3 (c) < 
£ as in the previous subsection. Moreover, we have that k + £ <m for all c < Ci, since 
c/R 3 {c) + c < 1 for c < Cl. 

Suppose a new job is placed on the £ machines with lowest load. By the invariant and 
since k + £ < m, there are k machines with load 7?-3 (c)opti. Denote the total load on 
the remaining machines (not the k old machines or the £ machines that were just used) 
by V. Then 

V > {W - fc7?.3(c)oPTi) • — ^ ^ 

m — k 

since these machines were not the least loaded machines before the new job arrived. 
Thus we need to check that 

klZs^c) ■ oPTi + £TZ3{c) ■ 0 PT 2 + V >W + X 

or 

£ £ 

klZsic) ■ oPTi • + £7^3 (c) • 0 PT 2 > W ■ + X. 

m — k m — k 

As before, we use that opti > W/m andoPT 2 > + (1 — a) f for any 0 < a < 1. 

Wetakea=5;^<tt(j^G[0,l]. 
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Fig. 1. Upper and lower bounds for identical machines. The horizontal axis is f/m, the vertical 
axis is the competitive ratio. The top line is the greedy algorithm, the middle line is our best upper 
bound and the lower line is our best lower bound. For c < 1 /2, this lower bound also holds for 
randomized algorithms. 



We find 

Movii 
m — k 



+ £oPT2 > 



ke 



.-k 



\ W / 
+ £a]—+(l 
J m V 



h 1 — ct ) a: > 

m J 



m—k 



since 7?,3 (c) satisfies 7?,3 (c) = 



7^3 (c) 

m-kc (using fc = T/7^3(c) = cm/7^3(c)). 



2m— fc — 



Theorem 3. For £/m < ci, the algorithm High(^/7?,3(c), TZ^(c)) maintains a compet- 
itive ratio ofTZ^{c), where c = £lm. 



We now show a lower bound for this case. This lower bound uses a technique origi- 
nally introduced by Sgall [17,18]. We omit the proof. 



Theorem 4. For m divisible by £, the competitive ratio of any randomized ( or de- 
terministic) algorithm is at least — ^ — \yzyzqi- Fhis gives a general lower bound of 
, -1 



Tli{c) = [I - {^Y) "forc = llm. 



We give an overview of the various upper and lower bounds in Figure 1 . 



5 Conclusion 

This paper considered the classical load balancing model in the context of parallelizable 
tasks. We designed and analyzed several algorithms, and showed tight bounds for many 
cases. As for open problems, there is a large amount of work done on various multiple 
machines scheduling and load balancing problems. Many of those on-line (and offline) 
problems are of interest to be studied for scenarios where parallelization is allowed. 
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For the special case of four machines and two parts, which is the smallest case for 
which we do not have a tight solution, we show in the full paper a lower bound of 1 .37085 
and an upper bound of 10/7. This is a better lower bound than Lemma 2, hinting that in 
areas where our bounds are not tight, the lower bound can be improved. 
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Abstract. It is well known that the Earliest-Deadline-First (EDF) and 
the Least-Laxity-First (LLF) algorithms are optimal algorithms for the 
problem of preemptively scheduling jobs that arrive over time on a single 
machine to minimize maximum lateness. It was not previously known 
what other online algorithms are optimal for this problem. A complete 
characterization of all optimal online algorithms for this problem is given. 



1 Introduction 

We consider the problem of preemptively scheduling jobs that arrive over time 
on a single machine to minimize maximum lateness. This problem is denoted 
l\rj,pmtn\Ljnax according to Graham et. al.’s notation [4]. It is well known that 
the Earliest-Deadline-First (EDF) algorithm and the Least-Laxity-First (LLF) 
algorithm are optimal for this problem [3,2]. EDF always runs an available job 
with the smallest deadline. LLF always runs an available job with the smallest 
laxity where the laxity of a job at time t is defined as the difference between 
its deadline and the sum of t and the remaining processing time of the job at 
time t. The laxity of a job indicates, on a job-by-job basis, how much the job 
can be delayed without being late. Both algorithms are online algorithms, which 
construct a schedule over time without knowledge of the existence of jobs that 
have not arrived. 

Evidently, there are other online algorithms which are optimal for this prob- 
lem as well. However, an exact characterization of these algorithms was not pre- 
viously known. This is in sharp contrast with another fundamental problem, the 
problem of preemptively scheduling jobs that arrive over time on a single machine 
to minimize the total completion time. This problem is denoted l\rj,pmtn\ ^ Cj 
according to Graham et. al.’s notation [4]. The Smallest-Remaining-Processing- 
Time (SRPT) algorithm is optimal for this problem [1]. SRPT always runs an 
available job with the smallest remaining processing time. For an algorithm to 
be optimal for the total completion time problem, it must follow the SRPT rule 
at any time. Any algorithm that deviates from it is not optimal. The rule is very 
rigid. Therefore, in fact, there is only one algorithm that is optimal for the total 
completion time problem. In contrast, there are at least two distinct optimal 
algorithms, namely EDF and LLF, and believably many more, for the maxi- 
mum lateness problem. Evidently, the governing rule that promises optimality 
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for the maximum lateness problem is more flexible than the total completion 
time problem, and EDF and LLF are merely manifestations of this underlying 
rule. 

This rule has been discovered and is presented in this paper. It turns out 
that the general idea behind LLF is right. However, to be absolutely right, the 
remaining processing time of other jobs must be taken into account. This leads 
to the definition of the “compound laxity” of a job. This, in turn, leads to the 
“compound laxity rule” (CL rule) for choosing a job to run at any time. Inter- 
estingly, this rule has a flavor of FDF. The result is a complete characterization 
of all optimal online algorithms for the maximum lateness problem; for an online 
algorithm to be optimal, it must always follows the CL rule. To the best of the 
author’s knowledge, this is the first non-trivial result of the sort. 

Beside a better understanding of a fundamental problem and the discovery 
of its underlying optimal rule, this work also has potential leading up to future 
research. One direction is to study problems of optimizing a second objective 
function subject to the constraint that the maximum lateness is minimum. With 
all the optimal online algorithms for the maximum lateness criterion identified, 
finding an algorithm to optimize the second objective will be more approachable. 
An example of work along this line is [6]. 

This work might be a basic building block for some other more complex prob- 
lems. One problem is the problem of preemptively scheduling jobs that arrive 
over time on identical parallel machines so that all jobs finish by their dead- 
lines. An online algorithm is “admissible” if it can produce a feasible schedule 
whenever the optimal algorithm can. An open question is whether there is an 
online admissible algorithm that uses cm machines for some constant c while the 
optimal algorithm uses only m machines [5] . The CL rule might help us identify 
a good algorithm for this problem. However, this is still an open question. 

The rest of the paper is organized as follows. In section 2, the definition of the 
problem and other quantities are given. The compound laxity and the compound 
laxity rule are defined. Section 3 furnishes the relationship between compound 
laxity and lateness. Main results are in section 4. 

2 Definitions 

The problem of preemptively scheduling jobs that arrive over time on a single 
machine to minimize maximum lateness is considered. This problem is denoted 
1 1 rj,pmfn I Lniax according to Graham et. al.’s notation [4]. An input instance 
in this problem consists of n jobs. Job j arrives at time Vj, has a processing 
time pj, and a due date dj. An algorithm for this problem must schedule the 
jobs preemptively on one machine. Let Cj denote the completion time of job 
j in schedule S. The lateness of job j in schedule S, denoted Lj, is defined as 
Lj = Cj—dj. Note that the lateness of a job could be negative if it finishes before 
its deadline. The maximum lateness of schedule S, denoted is defined as 

^max = maxj Lj. The goal is to And a schedule with the smallest maximum 
lateness. For any input instance I, let denote the maximum lateness 
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in an optimal schedule for I. An algorithm is online if it is not aware of the 
existence of jobs that have not arrived. When a job arrives, its processing time 
and due date become known to the online algorithm. An algorithm is ojfline 
if it is aware of all jobs and their parameters in advance. Let A{I) denote the 
schedule produced by algorithm A for input instance I. 

For any non-empty subset X of jobs in I, let 

— r{X) = miuigx n, 

— P{X) = J2^exPi^ and 

— d{X) = maxigx di. 

For any input instance I, any schedule S for I, and any times s, t where 
fj<s<t< Cj, let 

— Pj{t) be the remaining processing time of job j at time t in schedule S, 

— qfit) be the amount of work done on job j by time t in schedule S, 

— P^{X,t) = 

— qf{s,t) = qf{i) - qf{s), and 

— q^{X,sA) = Y.t(^x<lfis,t) 

By convention, if a job runs continuously from time s to time t, we say that 
it runs in the close-open interval [s,t)- Thus, qf{s,t) is the amount of time the 
machine spends on job i during the interval [s,t). 

— Let Bj{t) be the set of jobs i such that ri <t and di < dj. 

— Let Bj{s,t) = Bj{t) — Bj{s) for any times s and t such that s < t. 

— Let Bj^k{t) = Bk{t) — Bj{t) for any jobs j and k such that dj < dk- 

— Define the compound laxity of a job j at time t in schedule S, denoted Ij (t) , 

as lf{t) = dj -t - p^'{Bj{t),t)- 

— A job is available at time t if it has arrived but not completed at time t. 

— Define the critical compound laxity at time t in schedule S, denoted l^[-^{t), 

as = miuj l^{t) where the minimum is taken over the set of available 

jobs at time t in schedule S. 

— Let I^i^{t) be the set of all jobs i such that lf{t) = lX^{t). 

— Define the critical deadline at time t in schedule S, denoted dX^{t), as 

The set Bj{t) is the set of jobs that have arrived by time t with a deadline no 
later than dj. The set Bj(s,t) contains only jobs that arrive in the interval (s,t]. 
The set Bjj^{t) contains only jobs whose deadlines are in the interval {dj,dk\. 
Note that Bj{t), Bj(s,t), and Bj^kit) depend only on the input instance and are 
independent of the schedule. Note that di = dj implies that Bj{t) = Bj{t) and 
lf{t) = lj{t). The set I^i^{t) is the set of jobs whose compound laxity is critical, 
and the critical deadline dX^{t) is the earliest deadline among such jobs. 

A CL algorithm always runs an available job whose deadline is no later than 
the critical deadline. In other words, an online algorithm A is a CL algorithm if 
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for any input instance / and at any time t, algorithm A runs an available job 
j such that dj < We also say that algorithm A follows the CL rule. 

A schedule S for an input instance / is a CL schedule if it can be produced by 
following the CL rule. 

At any time t, the Earliest-Deadline-First (EDF) algorithm runs an available 
job j such that dj = min^ di where the minimum is taken over all available jobs 
[3]. At any time t, the Least-Laxity-First (LLF) algorithm runs an available job 
j such that dj —t — = min^ di — t — where the minimum is taken 

over all available jobs [2]. Both EDF and LLF are optimal for this problem. We 
will show in Section 4 that EDF and LLF follows the CL rule. In the following, 
we will drop the superscript S for quantities defined in this section if it is clear 
from the context. 

3 Basic Understanding of Compound Laxity 

In this section, basic properties of compound laxity and lateness are given. Ex- 
cept for Lemma 1, which applies only to optimal schedules, all results in this 
section apply to any schedule. Lemma 2 establishes a relationship between the 
compound laxity of two different jobs. Lemma 3 tells us how the compound lax- 
ity of a job changes over time. An implication of Lemma 3 is that the compound 
laxity of a job never increases over time. This leads to Corollary 1, which states 
that the compound laxity of a job is smallest when it completes. Lemma 4 es- 
tablishes a relationship between the compound laxity of a job and its lateness. 
Lemma 5 states that the maximum lateness equals the negation of the minimum 
compound laxity. 

Lemma 1. For any input instance I, m.axx(ii r{X) + p{X) — d{X) < L’^^^(L) 
where X is any non-empty subset of jobs in L. 

Proof. Consider any input instance /. Let S be an optimal schedule for /. Let X 
be any non-empty subset of jobs in L. All jobs in X start no earlier than r(X). 
The total processing time of jobs in X is p{X). Suppose job j is the last job in 
X to finish in schedule S. Thus, Cj > r{X) -\-p{X). The latest due date of jobs 
in X is d{x). In particular, dj < d{X) Thus, r{X) p{X) — d{X) < Cj — dj = 
Lf < Aiax = ^max(^)- □ 

Lemma 2. For any input instance I, any schedule S for I, any jobs i and j in 
L such that di < dj, and any time t such that rt <t < Cf and rj <t < Cj , it 
is the case that lj{t) — lf{t) = dj — di — p{Bi j{f),t). 

Proof. 

lj{t) - k{t) = [dj - t- p{Bj{t),t)] - [di -t- p{B,{f),t)] 

= dj - di - [p{Bj{t),t) - p{B,{t),t)] 

— dj di p(^Bi j(t^, t) 

□ 
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Lemma 3. For any input instance I, any schedule S for I, any job j in I, 
and any time s and t such that rj < s < t < Cj , it is the case that lj{t) = 
lf(s) -p{Bj{s,t)) - {t - s - q^{Bj{t),s,t)). 

Two factors control the change of the compound laxity of a job j over time: 
(1) arrival of new jobs i with a and (2) the amount of time spent (or not spent) 
on available jobs with a. Lemma 3 states that the compound laxity of a job at 
time t will decrease from that at time s by the amount that is equal to the sum 
of the total processing time of new jobs that arrive during the interval [s, t) and 
the amount of time during [s,t) that is NOT spent on jobs in Bj{t). 

Proof. 



lj{s) - lj{t) = [dj - s-p{Bj{s),s)] - [dj - t-p{Bj{t),t)] 

= {t-s)~ [p{Bj{s),s) -p{Bj{t),t)] 

= {t-s)~ [p{Bj{s),s) - p{Bj{s),t) - p{Bj{s,t),t)] 
by definition of Bj{s,t) 

= {t-s)~ q{Bj{s),s,t) - p{Bj{s,t),t) 
by definition of q{X, s, t) 

= {t-s)~ q{Bj{s),s,t) - p{Bj{s,t)) - q{Bj{s,t),t) 
by definition of p{X, f) and q{X, f) 

= {t- s)- q{Bj{t),s,t) -p{Bj{s,t)) 

because q{Bj{s,t),t) = q{Bj{s,t),s,t) 



□ 



Corollary 1. For any input instance I, any schedule S for I, and any job j in 
I, it is the case that min^gj^^. if (t) = lf{Cf). 

Proof. Since the machine can spend at most t — s time units on jobs in Bj(t) 
during [s,t), then q{Bj(f), s,f) <t — s. Thus, from Lemma 3, for times s and t 
such that rj<s<t<Cj, it is the case that lj{t) = lj{s) —p{Bj{s, t)) — {t — s — 
q{Bj{t), s,t)) < lj{s). In other words, lj{.) can never increase over time. Since 
Cj is the latest time in the interval [rj, Cj), the result follows. □ 



Lemma 4. Consider any input instance I, any schedule S for I, and any job j 
in /. 

(a) Lf = -lS^cf)-p{B,{Cf),Cf) < -ifiCf). 

(b) Lj = —lf{Cj) if and only if job j is the last job in Bj{Cf) to complete. 

(c) If Lf < -lf{Cj), then Lf > -lj{Cf) + {dj - dx) > where x is 

the last job in Bj{Cf) to complete. 
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Proof. For part (a), Lj = Cj-dj = -{dj-Cj-p{Bj{Cj),Cj))-p{Bj{Cj),Cj) = 
—lj{Cj) — p{Bj{Cj),Cj). Next we show part (b). Job j is the last job in Bj{Cj) 
to complete if and only iip{Bj{Cj),Cj) = 0. This follows directly from definition 
of p{X,t). Thus, Lj = if and only if job j is the last job in Bj{Cj) to 

complete. Finally, we show part (c). If < —lj{Cj), then job j is not the 
last job in Bj{Cj) to complete. Then p{Bj\Cj), Cj) > 0. Let x be the last job in 
Bj{Cj) to complete. Thus, Cx > Cj+p{Bj{Cj),Cj). From definition of Bj{Cj), it 
is the case that dx < dj Thus, Lx = Cx~dx > Cj+p{Bj{Cj),Cj)—dj+{dj—dx) = 
-lj{C^) + {dj-dx)>-lf{Cf). □ 



Lemma 5. For any input instance I, any schedule S for I, it is the case that 
^max = ~ mini Further more, for any job k, it is the case that Lf = 

if and only if if. {Cf) = mini /f (Cf) and k is the last job to finish in B}^{Cf). 

Proof. Let k be any job such that Lk = Lmax- First, we will show that Lk > 
— mmilf{Cf). Assume to the contrary that Lk < —mini lf{Cf). Let j be a 
job such that lj{Cj) = mini^i(C'i). Thus, Lj < L^,^k = Lk < —imiiilfiCi) = 
—lj{Cj). Thus, from Lemma 4 part (c), there exists a job x such that Lx > 
—lj{Cj). Thus, Lmax < ~h{Cj) < Lx. A contradiction. From Lemma 4 part (a), 
it is the case that Lk < —lk{Ck). Thus, Lk < —lk{Ck) < — minili(C'i) < Lk. 
Hence, Lmax = Lk = —lk{Ck) = — mini IfiCi). From Lemma 4 part (b), job k is 
the last job in Bk{Ck) to finish. 

To prove the other direction, assume that k is the last job to finish in Bk{Ck) 
and lk{Ck) = mini hiCf). Assume to reach a contradiction that Lk < Lmax- Then 
Lk < Lniax = — mini?i(C'i) = —lk{Ck). From Lemma 4 part (c), there exists a 
job X such that Lx > —lk{Ck) > Lmax- A contradiction. Thus, Lk = Lmax- □ 

4 Results on Compound Laxity Algorithms 

In this section, results on CL algorithms are given. Lemma 6 is the main lemma, 
which establishes a relationship similar to that in Lemma 1. Lemma 1 and 
Lemma 6 implies Theorem 1, which says that CL algorithms are optimal. In 
Lemma 7, EDF and LLF are shown to be CL algorithms. This implies that EDF 
and LLF are optimal as stated in Corollary 2. Corollary 3 states a basic property 
of EDF, which is used in Theorem 2 in showing that non-CL algorithms are not 
optimal. 

Lemma 6. For any input instance I and any CL algorithm A, Lmax < maxjfc/ 
r{X) + p{X) — d{X) where X is any non-empty subset of jobs in I. 

Proof. Fix a CL algorithm A. Consider any input instance I. Suppose S is the 
schedule produced by A for I. Let fc be a job such that Lk = Lmax- On a tie, 
choose one with the largest deadline. Note, if there are still more than one such 
jobs, at most one of them will have a non-zero processing time, and all of them 
complete at the same time. Consider the job set Bk{Ck). It is the set of jobs i 
such that ri < Ck and di < dk. 
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Let t be the smallest time such that the machine continuously and exclusively 
runs jobs in Bk{Ck) in the interval [t, Ck) in schedule S. Note that before time 
t, the machine is either idle or is running a job not in Bk(Ck)- Also, t could 
possibly be 0 if the machine has been running jobs in Bk{Ck) since time 0. We 
will show that t < Cfc. Job fc completes at time Ck- There must exist some time 
t' such that t' < Ck and the machine continuously processes job k in the interval 
[t',Ck)- Obviously, k S Bk{Ck)- Thus, t<t'< Ck- 

Let Y be the set of jobs i in Bk{Ck) such that t < Vi, that is, i G T if and 
only t < Ti < Ck and di < dk- Note that Y may not be equal to Bk{t, Ck) as 
the latter does not include jobs released exactly at time t- 

We claim that during [t, Ck) the machine continuously and exclusively pro- 
cesses jobs in Y- This implies that k G Y- Thus, Y is non-empty. From the defini- 
tion of Y, it follows that r{Y) > t and d{Y) < dk - The former and the claim imply 
that p{Y) >Ck- r{Y)- Hence, Lmax = Lk = Ck~ dk< r{Y) + p(Y) - d(Y) < 
maxxci r{X) -G p{X) — d{X), which is the statement of the lemma. 

It remains to show that during [t, Ck) the machine continuously and exclu- 
sively processes jobs in Y - In other words, if A runs a job i (for a non-zero 
amount of time) in the interval [t, Ck), then t < Vi < Ck and di < dk- From the 
definition of t and Bk{Ck), it is the case that di < dk and rj < Ck- In the rest 
of the proof, we will show that > t- 

Assume to reach a contradiction that there are jobs i in Bk(Ck) that run in 
the interval [t, Ck) and < t- Let u be a job with the largest deadline among such 
jobs. Consider time interval Since algorithm A, which is a CL algorithm, 

does not insert idle time unnecessarily, then the machine is busy during 
Since u G Bk{Ck), then 



du < dk ( 1 ) 

Let V and s be the job and the smallest time, respectively, such that algorithm 
A runs job continuously during [s,t) and no jobs arrive during (s,t). It must 
be the case that dk < d^- Otherwise, this will contradict the choice of t- 

Let w be a job such that d^ = dcrit(s). It is the case that < dw because 
algorithm A follows the CL rule. Thus, du < dk < dy < d^- It must be the case 
that Ck < Cu, because job w is not completed by time s and it never runs in the 
interval [s,Ck)- 

Next, we show that lui{s) < lu{s) — {t — s)- Assume to the contrary that 
lw{s) = lu{s) — A for some A where 0 < Z\ < t — s- It is the case that lu{s + A) = 
lu{s) — A = lw{s) = lw{s + A) where the first and the last equalities follows from 
Lemma 3, that algorithm A runs job v continuously in the interval [s, s -I- A), 
and that no jobs arrive during (s, t)- With a similar argument, for any job i such 
that dy < di, it follows that li{s -I- Z\) = li{s) > lyj{s) = ly,{s + A)- We can show 
that dcrit(s -I- A) < dy- If dy < dcrit(s + A), then Zmin(s + A) = li{s -G A) for 
some job i with dy < di- However, li{s -I- A) > ly,{s -G A) = lu{s -G A). Thus, 
lu{s -G A) = lmin{s + A) which contradicts that dy < dy < dcrit(s + A) .If 
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dcrit(s + Z\) < dv, then job v should not be chosen to run at time s + Z\. A 
contradiction to that v runs continuously in the interval [s,t). Therefore, 

dw du ^ ,s) {t s) — pi,ddu^w{s ) 5 (2) 

where the inequality follows from Lemma 2 (with i = u,j = w and t = s) 
and that lw{s) — lu{s) < —{t — s), and the equality follows from the fact that 
the machine continuously processes job v during [s,t) and v € Bu,w{s). 

We claim that all jobs in Bu,k{s) complete by time s, i.e. p{Bu^k{s),s) = 0. 
Assume to the contrary that there is a job x with < s, d^ < d^ < dk, and 
Px{s) > 0. Since = Lmaxj then from Lemma 5, = —lk{Ck), and from 

Lemma 4 part (b), job k is the last job in Bk{Ck) to complete. This implies that 
job X must complete by time Ck- Job x cannot run during [s,t) because v is 
running in that interval. Thus, x must run for some time in the interval [t, Ck)- 
This contradicts the choice of u because du < dx- Thus, x does not exist, and 
p{Bu,k{s), s) = 0 as claimed. 



P{B-ixyj (s) , t) P{By^k(,^) 5 4“ P{Bk^W ('^) J 

= p{Bk,w{s),t) because 0 < p{Bu,k{s), t) < p{Bu,k{s),s) = 0 
= p{Bk,w{s)jCk) 

because jobs i with di > dk do not run during [t, Cfc) 

< p{Bk,w{Ck),Ck) because s < Ck (3) 

Thus, from inequalities (1), (2), and (3), it follows that dyj — dk < dw — du < 
p{Bu,w{s),t) < p{Bk,w(Ck),Ck)- Thus, from Lemma 2 (with i = k, j = w, 



and t = Ck), it follows that lw{Ck) < lk{Ck)- Then —lw{Cw) > —lw{Ck) > 
—lk{Ck) = — mini/i(Ci) > —lw{Cw) where the first inequality follows from 
Corollary 1, and the equality follows from Lemma 5. Thus, all of these quantities 
are equal. However, dk < dw, which contradicts the choice of k. Thus, the earlier 
assumption that u exists is false, and this completes the proof. □ 



Theorem 1. For any input instance I and any CL algorithm A, it is the case 
that Lmix = maxxc/ r{X) +p{X) — d{X) = In other words, any online 

algorithm A which is a CL algorithm is optimal for the problem l\rj,pmtn\Lw,ux- 

Proof. This follows from Lemma 1, Lemma 6, and the fact that < Lmix . 

□ 



Lemma 7. Algorithms EDF and LLF are CL algorithms. 

Proof. EDF always runs a job with the smallest deadline, which is no larger than 
the critical deadline. Thus, EDF always follows the CL rule. 

Now we show that LLF is a CL algorithm. Suppose to the contrary that there 
exist an input instance I and time t such that LLF violates the CL rule, that 
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is, LLF chooses to run a job with a deadline larger than the critical deadline. 
Let j be the job that LLF chooses to run at time t. Thus, dj — t — = 

Toiwidi — t — Let fc be a job such that Since LLF 

violates the CL rule at time t, then dk < dj. 



dj-t- <dk-t- Pk^^it) 

d,-d,<p^^^{t)-pl^^{t) (4) 

;LLF < 

dk - t - p^^^{Bk{t),t) < dj - t - p^^^{Bj{t),t) 
dk-p^^^{Bk{t),t) < dj - p^^^{Bj{t),t) 
dj -dk> p^^^{Bj{t),t) - p^^^{Bk{t),t) 
dj -dk> p^^^{Bk,j{t),t) (5) 



- pLLF(^) > p^^^{Bk,j,t) from (4) and (5) 

— because rj < t and dk < dj 

> Pj^^{t) — p]l^^{t) because job k has not finished by time t. 
A contradiction. □ 

Corollary 2 . EDF and LLF are optimal algorithms for the problem l\rj,pmtn\ 
Lmax ■ 

Proof. The results follow from Lemma 7 and Theorem 1. □ 

Corollary 3. For any input instance L , any job j in L , and any time s and t such 
that rj < s < t < , if no jobs arrive during (s,t], then lf^^{t) = l™^(s). 

Proof. Consider any job j. 

lj{t) = lj{s) — p{Bj{s, t)) — {t — s — q{Bj{t),s, t)) from Lemma 3 

= lj{s) — (t — s — q{Bj{f), s, t)) because no jobs arrive during (s, t] 

= lj{s) — {t — s — {t — s)) because EDF always runs a job with 

the smallest deadline (no larger than dj), which must be in Bj{f) 

= I As) 

□ 

Theorem 2 . Lf an online algorithm A is not a CL algorithm, it is not optimal 
for the problem l|rj,pTOtn|Lmax- 
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Proof. Let A be an online algorithm which is not a CL algorithm. Then there 
exists an input instance / and the smallest time t such that A does not run any 
job in in the interval [t,t + A) where Z\ > 0 and re is a job such that 

In other words, algorithm A violates the CL rule during the 
interval [t, t + A). Without loss of generality, assume that no jobs arrive during 
the interval ft, t + A). If this is not true, decrease A until this becomes true. 

Let It be the input instance obtained from / by removing all jobs i with 
release time rt > t + A. Since the online algorithm A cannot distinguish between 
/ and It before time t+ A, then the schedule produced by A for It will be the 
same as that for / during [0,t + Z\). Create an instance I' from It by adding 
to It a new job x with = t + A, = max{0, + L^^,^{It)}, and 

da; = du,. We will show that = max{L^ 3 ^^(/t), but > 

Z\ + max{L^„^^(/t), 

First, we show that = max{LJ^„^^(/t), We can construct 

an optimal schedule for I' in the following way; use the schedule produced by A 
for I (and the same for It) for the interval [0,t). From time t on, use EDF. Note 
that A follows the CL rule during [0,t), and EDF is a CL algorithm, then the 
entire schedule is a CL schedule, which is optimal. Call this schedule S. Note that 
Iw^'^ ^ because input instances / and I' are indistinguishable 

at time t and schedules A{I), A{I'), and S are the same during the interval [0, f). 
Let X be the set of jobs that have finished by time t + Z\ in schedule S. Let Y 
be the set of jobs that have not finished by time t + A va schedule S. For any 
job j in Y, 



Lj < from Lemma 4 part (a) 

= —Ij ft + A) from Corollary 3 



_ -Pa:) iidx<dj 

if dj < dj, 

from Lemma 3, that EDF is the algorithm used, job x arrives at time 
t + A, and no other jobs arrive during ft, t + Z\)] 

<Px- li{t) because l^{t) = < lf{t) 

= max{L^a^^(/t), -l^^^\t)} by definition of 



Note that Lf < Lf^„^^{It) for all z G AT because X is a subset of It and S is an 
optimal schedule. Thus, = maxjmaxigx Lf , max^gy if} = max{LJ(jg^^(/t), 
-lt'"^\t)}. Next, we show that > max{L^a^x(-It), -lf"^\t)} + A. 
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~ \cf^^ ^) from Lemma 5 

j j j 

> —l^^^\t + A) from Corollary 1 

— Px — A) from Lemma 3, arrival of job x, and that 
jobs in Bxu{t + Z\) = B^{t) do not run during \t, t + A) 

= + A by definition of Px 

Thus, we have shown that algorithm A, which is not a CL algorithm, does 
not produce an optimal schedule for □ 

Theorem 2 shows that, to be optimal, an algorithm cannot deviate from CL 
rule at any time, even though the current critical compound laxity is large and 
the maximum lateness among jobs completed so far is large. An interesting point 
to note is that if an online algorithm A deviates from the CL rule, the adversary 
can construct a proof for the non-optimality of algorithm A in an online fashion. 
As algorithm A executes over time, as soon as it deviates from the CL rule, the 
adversary can generate one additional job to cause A to have a larger lateness 
while an optimal algorithm can still maintain a smaller lateness. 
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Abstract. We study the power assignment problem in radio networks, 
where each radio station can transmit in one of two possible power levels, 
corresponding to two ranges - short and long. We show that this problem 
is NP-hard, and present a polynomial-time assignment algorithm, such 
that the number of transmitters that are assigned long range by the 
algorithm is at most (11/6) times the number of transmitters that are 
assigned long range by an optimal algorithm. 



1 Introduction 

Assigning power levels (corresponding to transmission ranges) to the transmit- 
ters of a radio network, so that the total power consumption is as low as possible, 
is often an extremely important issue. Let P be a set of n points in the plane, rep- 
resenting n transmitters-receivers (or transmitters for short). We need to assign 
transmission ranges to the transmitters in V, so that (i) the resulting communi- 
cation graph is strongly connected; that is, the graph over V in which there exists 
a directed edge from p to g if and only if q lies within the transmission range 
Tp assigned to p, should contain a directed path from any transmitter p G V to 
any other transmitter q G V, and (ii) the total power consumption (i.e., the cost 
of the assignment of ranges) is minimized, where the total power consumption 
is a function of the form c > 0 is a constant typically between 2 

and 5. 

This version of the power assignment problem is known to be NP-hard; 
Kirousis et al. [10] first proved this for 3-dimensional point sets and dementi et 
al. [8] then proved this also for planar point sets. Kirousis et al. also present a 
2-approximation algorithm, based on the minimum spanning tree of V, which is 
the best approximation known. 

In practice, it is usually impossible to assign arbitrary power levels (ranges) to 
the transmitters of a radio network. Instead one can only choose from a constant 
number of preset power levels, corresponding to a constant number of ranges. In 
this paper we consider the power assignment problem in radio networks, where 
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each transmitter can transmit in one of two given power levels - low or high, 
corresponding to two possible ranges - short (ri) and long (r 2 ). Since the cost 
of an assignment of power levels to the transmitters is a function of the form 
nirf + (n — rii)r 2 , where ni is the number of transmitters that are assigned range 
ri and c > 1 is some constant, the cost of an assignment is determined solely by 
the number of transmitters that are assigned range r 2 - 

In Sect. 4 we prove that the power assignment problem with two power 

levels is NP-hard when r 2 > by constructing a reduction from planar 

cubic vertex cover. More precisely, we show this for the special case where the 
initial components graph (see below) is a star. 

Let m be the number of transmitters that are assigned range V 2 in an optimal 
assignment OPT. In Sect. 2 we describe a polynomial-time algorithm that as- 
signs range T 2 to at most (ll/6)m transmitters, or in other words, this algorithm 
computes an (ll/6)-approximation (with respect to the number of transmitters 
that are assigned long range). 

An immediate corollary of this result is that for any ranges ri , V 2 and for any 
c, we can compute an assignment whose cost is at most (11/6) times the cost of 
an optimal assignment. Usually though the cost of our assignment is much less 
than this, as is shown in Sect. 2.1. In this section we analyze the common case 
where ri = 1 and T 2 = d. Our algorithm computes in this case an assignment 
whose cost is at most times the cost of an optimal assignment. Plugging for 
example d=2we get a 44/29 « 1.52 approximation, if c = 2, and a 22jYI « 1.29 
approximation, if c = 1. 

A by-product of our range assignment algorithm is an algorithm for assigning 
ranges in the special case where the initial components graph is a tree. That is, 
consider the connected components of the communication graph that is obtained 
after assigning short range to all transmitters in V . We draw an edge between 
two components C\ and C 2 if and only if there exists transmitters p\ G C\ and 
P2 G C2, such that the distance between them is at most T2. Now if this graph 
happens to be a tree then the algorithm described in Sect. 3 assigns long range 
to at most (4/3)m transmitters, where m is the number of transmitters assigned 
long range by an optimal algorithm. 

More related work. Other variants of the power assignment problem have 
been studied. One such variant is the symmetric power assignment problem, 
where the corresponding communication graph is undirected and there exists 
an edge between two transmitters p and q if and only if both transmitters were 
assigned ranges greater than (or equal to) the distance between them; see [2, 
3,4]. dementi et al. [6] consider the problem of assigning ranges to a set of 
transmitters on a common line, so that for any two transmitters p and q there 
exists a path from p to g of at most h hops in the corresponding (directed) 
communication graph. The case where h = n — 1 was also considered by [10]. 
An important related problem is the minimum-energy broadcast tree problem: 
Assign ranges to the transmitters so that a designated source transmitter can 
broadcast messages to all other transmitters; see, e.g., [5,7,11,12,14,15,16]. 
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2 An (11/6)- Approximation 

Let V he & set of n points in the plane representing n transmitters-receivers (or 
transmitters for short) , and assume that each transmitter can transmit in one of 
two possible power levels - low or high, corresponding to short range (ri) or long 
range (r2). Further assume that if all transmitters in V are assigned long range, 
then the resulting communication graph is strongly connected. In this section we 
describe a polynomial-time algorithm for assigning ranges to the transmitters in 
V, such that the number of transmitters that are assigned long range is at most 
(ll/ 6 )m, where m is the number of transmitters that are assigned long range 
by OPT. 

Let G be the (undirected) graph of components. G is defined as follows. 
Assign to each transmitter in V short range and draw an edge between two 
transmitters p and q if \p,q\ < ri, where \p,q\ denotes the Euclidean distance 
between p and q. We think of each of the connected components in this graph 
as a subset of V. These subsets are the nodes of the graph G; we shall call them 
components. We draw an edge between two components Gi and C2 of G if there 
exist transmitters p G G\ and q G G2, such that |p, q\ < r2- See Fig. 1 . 




Fig. 1. The components graph G. 



Notice that we can easily obtain a 2 -approximation. Simply compute a min- 
imum spanning tree of G, and, for each edge (Gi,G2) of the tree, assign long 
range to two transmitters p G G\ and q G G2, such that \p,q\ < r2- 

Our range assignment algorithm consists of two stages. In the first stage we 
repeatedly find a cycle in G and reduce it to a single component by assigning 
long range to one transmitter in each of the components in the cycle. The second 
stage begins when there are no more cycles in G, i.e., when G is a tree. In this 
stage we assign long range to some more transmitters in order to complete our 
task. 

We now describe the first stage in detail. While there is a cycle in G do the 
following. Let Gi, G2, . . . , Gp Gi be any cycle of size I > 3 . Assign long range to 
any transmitter in Gi that can reach a transmitter in G2, assign long range to any 
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transmitter in C 2 that can reach a transmitter in C 3 , etc. All together we assign 
long range to I transmitters. Notice that after doing so any two transmitters in 
the union C = Ci U • • • U C/ can talk with each other possibly through other 
transmitters in C. Thus these I components reduce to a single component C 
and the number of components decreases by 1 — 1. We update the graph G by 
replacing Ci, . . . , C; with the single component C. After doing so we forget that 
some of the transmitters in C have already been assigned long range, and update 
the edges in G accordingly, see Fig. 2. 




Fig. 2. Reducing the cycle Ci, C2, C3, C4, Ci to the single component C. 



At this point there are no cycles left in G, in other words G is a tree. In 
the next section we present a range assignment algorithm for the case where the 
components graph is a tree. This algorithm assigns long range to at most |mtree 
of the transmitters, where rritree is th® number of transmitters that are assigned 
long range by an optimal algorithm for this case. Thus in the second stage we 
apply the algorithm of the next section to G to complete the range assignment 
task. We now show that the overall number of transmitters that were assigned 
long range is bounded by 

Theorem 1. The range assignment algorithm (described above) computes an 
{11/ 6) -approximation in polynomial time. 

Proof. Recall that in the first stage a loop is executed, such that, in each iteration 
a cycle in G of length at least three is found and replaced by a single component. 
Let i be the number of cycles that were found during the execution of the loop. 
We assume that all these cycles are of length exactly three, since this is the worse 
case for our analysis. 

Let k be the initial number of components in G, i.e., right at the beginning of 
the first stage. Then m, the number of transmitters assigned long range by OPT, 
is at least k, since in each initial component at least one of the transmitters must 
be assigned long range. During the first stage the algorithm assigns long range 
to at most 3f transmitters, and the number of components in G at the end of 
the first stage is k — 2i. 
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At this point G is a tree and we distinguish between two cases. 

Case 1: t > k/2 — m/2>. In this case, instead of performing the second stage, we 
proceed in the most trivial way (for the purpose of the analysis only) and assign 
long range to 2(fc — 2i — 1) transmitters. That is, for each edge in G connecting 
between two components Gi and G 2 , we assign long range to any transmitter 
in Gi that can reach a transmitter in G 2 and vise versa. The total number of 
transmitters that were assigned long range is thus bounded by 

„ ,k m, 3fc m 11 

3i + 2{k — 2i — 1) < 2k — i < 2k — — — ) = — + — < . 

^ o ^00 

Case 2: i < k/2 — m/3. In this case we perform the second stage as described 
in Sect. 3 and assign long range to at most (4/3) mt^ee transmitters, where mtree 
is the number of long range assignments needed to solve the tree G. But clearly 
iTT-tree ^ SO the number of transmitters assigned long range in the second 
stage is at most (4/3)m. The total number of transmitter that were assigned 
long range is thus bounded by 



„ 4m „ , /c m , 4m 

ir>+- 



3k 

~2 



m 11 



Since in both cases we were able to bound the total number of long range assign- 
ments by (ll/6)m, we conclude that our range assignment algorithm computes 
an (ll/6)-approximation. □ 



Recall that the cost of an assignment is nirj + (n — rii)r 2 , where rzi is the 
number of transmitters that are assigned range ri and c > 1 is some constant 
typically between 2 and 5. An immediate corollary of Theorem 1 is that for any 
ranges ri , r 2 and for any c, we can compute an assignment whose cost is at most 
(11/6) times the cost of an optimal assignment. Usually though the cost of our 
assignment is much less than this, as is shown below. 



2.1 The Cost for Ranges 1 and d 

Theorem 2. If ri = 1 and = d, then one can compute a range assignment 
whose cost is at most times the cost of an optimal assignment. For d = 2 

we get a {A4:/2tf) -approximation, if c = 2, and a (22/17)-approximation, if c = 1. 

Proof. The cost of an optimal algorithm is d'^ ■ m 1 ■ (n — m) = n (d° — l)m, 
where m is the number of transmitters assigned long range. We apply both our 
algorithm and the naive algorithm which assigns range d to all the transmitters. 
Put a = n/m. We distinguish between two cases. 

Case 1: a < 11/6. In this case we use the naive algorithm whose cost is (f^n. 
The ratio between the cost of the naive algorithm and the cost of an optimal 
algorithm is 



< 



-I- (d'^ — 1 )to n -I- (d° — l)(6/ll)r 



d^^ 

(6/ll)d=-k(5/ll) ■ 




436 



P. Carmi and M.J. Katz 



Case 2: a > 11/6. In this case we run our algorithm whose cost is 
• (ll/6)m + 1 • (n — (ll/6)m) = n + (ll/6)((i'^ — 1 )to . 

The ratio between the costs is 

n+ (ll/6)(d‘= - l)m _ a+ (ll/6)(d‘= - 1) (ll/6)d= _ 

n+(d'=— l)m a+(d'=— 1) “ d° + 5/6 (6/ll)d‘= + (5/11) 

In both cases we got a cost. Thus for 

d = 2 we get a (44/29)-approximation, if c = 2, and a (22/17)-approximation, if 
c = 1. □ 

3 A (4/3) -Approximation for a Tree of Components 

In this subsection we present a (4/3)-approximation algorithm for the case where 
the components graph G is cycle free, i.e., where G is a tree. In particular G 
may be the graph that is obtained at the end of the first stage of the general 
algorithm above. 

We first pick an arbitrary component in G to be the root of G. Given a 
component G in G, we can now refer to its children components and to its 
parent component in the regular meaning. 

For each component G we need to assign long range to some of the trans- 
mitters in G, so that for each child G' of G at least one of the transmitters in G 
assigned long range can reach (a transmitter in) G', and also at least one of these 
transmitters can reach the parent of G. A neighbor (i.e., one of the children or 
the parent) G' of G is satisfied if at least one of the transmitters in G that can 
reach it when assigned long range is assigned long range. 

Initially all neighbors of G are unsatisfied. Our goal is to assign long range to 
a small number of transmitters in G so that all neighbors of G are satisfied. One 
can view this problem as a set cover problem: For each transmitter p in G let 
Cp be the subset of the neighbors of G that can be reached from p by assigning 
long range to p. It is easy to verify that the size of Gp is at most 5 (since no two 
components in Gp can be neighbors in G). Thus we could apply known results 
for k-set cover to achieve our goal; however, this would lead to a weaker result 
than the one that we obtain below. 

We start with the leaf components. The case of a leaf component G is very 
simple; we assign long range to any transmitter in G that can reach the parent of 
G (when it is assigned long range). After considering all leaf components, we con- 
sider the internal components, where an internal component may be considered 
only if all its children have already been considered. 

Let G be the internal component that is about to be considered. Let xc be 
the number of children of G. Clearly for each child G' of G, we must assign 
long range to at least one of the transmitters in G' that can reach G (after it 
is assigned long range). Let tuq be the number of long range assignments (to 
transmitters in G) needed to satisfy all children of G. Then me, the number of 
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long range assignments (to transmitters in C) assigned by OPT, is either ttiq, 
if the rriQ transmitters satisfying the children of C can be chosen so that one of 
them also satisfies the parent of C, or me = rriQ + 1, otherwise. The following 
inequalities are immediate: '^e ~ where m is the overall number of long 

range assignments assigned by OPT, and \C = k — 1 < m, where k is the 
number of components in G. We will assign long range to at most ^Xc + 
transmitters in C. Summing over all components in G we obtain 

^ 1 4 

2^(gXC + me) < -m + m= -m . 

For each transmitter p in C, let dp (the degree of p) be the number of unsat- 
isfied children of G that would be satisfied if p were assigned long range. Notice 
that dp refers only to the children of G and not to its parent. After assigning 
long range to a transmitter g in C we update the degrees dp of all transmitters 
p in C (in particular dq becomes 0) . 

We are now ready to describe our algorithm for assigning long range to 
transmitters in G. If xc < 2, then we “solve” G optimally, that is, we find a 
minimum subset of transmitters in C that can reach all children of C and can 
also reach its parent (when assigned long range). We can do this since in this 
case me < 3. 

Otherwise, as long as the number of unsatisfied children is at least 3 and 
there exists a transmitter of degree at least 3, we assign long range to any such 
transmitter q and update the degrees of all transmitters in G accordingly. By 
assigning long range to q we satisfy at least 3 of the children of G . Since for each 
of these 3 children, OPT assigns long range to one of their transmitters so that 
it can reach C, we charge the assignment to q to these 3 assignments of OPT. 
Thus in this loop we have used at most |(xc ~ x) long range assignments to 
transmitters in C, where a; > 0 is the number of remaining unsatisfied children 
of C. 

At this point either x < 2, or a; > 3 and all transmitters in G have degree 
at most 2. In the former case we “solve” the remaining subproblem optimally 
(assigning long range to at most 3 < me transmitters in C). We have used in 
total at most me + |xc long range assignments. 

In the latter case, where we are left with at least 3 unsatisfied children and 
transmitters of degree at most 2, we first assign long range to any transmitter in 
G that can reach C’s parent (when assigned long range), and update the degrees 
of the transmitters in G. We charge this assignment to the at least 3 remaining 
unsatisfied children of G. Next we “solve” the remaining subproblem optimally 
using the optimal solution to 2-set cover [9] . Again we have used in total at most 
me + |xc long range assignments. 

Theorem 3. If the components graph G is a tree, one can compute a range 
assignment that is a {A/ 3) -approximation in polynomial time. 
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4 NP-Hardness 

Let ri and T 2 be any two ranges, such that T 2 > In this section we show 

that the problem of finding an optimal range assignment for a given set V of 
points in the plane (representing transmitters-receivers) is NP-Hard. One can 
think of the problem as follows: Assign short range (ri) to each of the trans- 
mitters in V. The goal now is to find a smallest subset V' oi transmitters, 
such that, after assigning long range (r 2 ) to each of the transmitters in V' , one 
obtains a strongly connected graph. 

Consider the components graph G that is obtained when each transmitter in 
V is assigned short range (see Sect. 2 for a precise definition of G). We show that 
even the special case where G is a star, i.e., G consists of one central component 
G that is connected to k orbit components (see Fig. 3) is NP-hard. In this case, 
the problem is to find a smallest subset of transmitters in G that satisfies all 
orbit components (when each of the transmitters in the subset is assigned long 
range) . 




Fig. 3. A star graph of components. 



We describe a reduction from minimum vertex cover in planar cubic graphs. 
Let PGG = {V, E) be a planar cubic graph (i.e., each of the nodes in PGG has 
degree at most 3). A vertex cover for PGG is a subset U of V, such that, for 
each edge (ui, ^ 2 ) G E, either v± G U or V 2 & U . The problem of finding a vertex 
cover of minimum size in planar cubic graphs is known to be NP-Hard [1,9]. 

Valiant [13] showed that any planar cubic graph PGG = (V,E) can be 
embedded in a rectangular grid of size 0(|V|^) as follows. Each node v G V 
corresponds to some grid vertex, and each edge (vi,V 2 ) G E corresponds to 
a rectilinear path formed of grid edges, whose endpoints are the grid vertices 
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Fig. 4. Converting the embedded graph PCG' to a star graph of components. 



corresponding to v\ and V 2 - Moreover the interiors of any two such paths are 
disjoint. 

We now convert the embedded graph PCG' = {V',E') into a star com- 
ponents graph G, see Fig. 4. We assume that the distance between adjacent 
grid vertices is 3r2. Each edge e' G E' is converted into an orbit component of 
G, and the set V is converted into the central component of G. We convert 
e' = {v'itV' 2 ) G E' into an orbit component by placing transmitters on the path 
e' as follows. Place transmitters along the path e' beginning at the point on e' 
at distance from uj and ending at the point on e' at distance T 2 from v' 2 , such 
that the distance between any two consecutive transmitters is at most r\. 




Fig. 5. Connecting between v'\ and v' 2 . 
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We convert the set V into the central component by placing transmitters as 
follows. For each edge e' = (wi, G E' we place transmitters at v[ and at v'2 and 
along one of the two dashed paths between them (see Fig. 5), so that the distance 
between any two consecutive transmitters is at most r\. The requirement V 2 > 

ensures that if we are careful then none of the transmitters along the 
portion of the dashed path connecting v'^ (alternatively, v' 2 ) to the center of the 
appropriate adjacent grid cell is within distance r\ of a transmitter belonging 
to an orbit component. (Notice that we may assume that PCG is connected, 
since otherwise we could find a minimum vertex cover for each of its connected 
components and their union would be a minimum vertex cover for PCG.) 

It is easy to verify that we obtained a star components graph G. That is (i) 
a transmitter in an orbit component C' that is assigned long range can either 
not reach any other component, or can only reach the central component (as is 
the case for the extreme transmitters in C'), and (ii) for each orbit component 
C' obtained from the edge e' = (wi,W 2 ) there exists a transmitter in the central 
component that can reach C' , when assigned long range. The transmitters at v'^ 
and at v '2 are such transmitters. 

Moreover, for any transmitter p in the central component, there exists a 
vertex v' that dominates it, in the sense that if both p and v' are assigned 
long range, then any orbit component that can be reached from p can also be 
reached from v' . Therefore when solving the range assignment problem, we may 
restrict ourselves to vertices v' in the central component. Also the total number 
of transmitters that were used in the construction is polynomial in n. Finally, an 
optimal solution for the range assignment problem corresponds to a minimum 
vertex cover for the graph PCG. 

Theorem 4 . Let r\ and V2 he any two ranges, such that T2 > Then the 

problem of finding an optimal range assignment (where r\ and r2 are the two 
possible ranges) for a given set V of points in the plane is NP-Hard. 
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Abstract. We show that for any set of disjoint line segments in the 
plane there exists a pointed binary encompassing tree T, that is, a span- 
ning tree on the segment endpoints that contains all input segments, has 
maximum degree three, and every vertex u € T is pointed, that is, v 
has an incident angle greater than tt. Such a tree can be completed to a 
minimum pseudo-triangulation. In particular, it follows that every set of 
disjoint line segments has a minimum pseudo-triangulation of bounded 
vertex degree. 



1 Introduction 

Disjoint line segments in the plane are the fundamentals of computational geom- 
etry. They form the atomic structure of most planar geometric data structures 
and geographic information systems. Planar objects are typically represented 
by a polygonal approximation which, in turn, is composed of (interior) disjoint 
line segments. Not surprisingly, researchers studied many of their combinatorial 
properties, such as visibility, compact representation, and ray shooting. 

Geometric graphs. We follow one particularly well-studied trail: that of con- 
strained geometric graphs. A geometric graph is a graph together with a planar 
embedding such that the edges are straight line segments. We consider crossing- 
free geometric graphs, that is, we do not allow two edges to cross. Given a set 
of disjoint segments in the plane (that is, a crossing- free geometric matching), 
we say that a graph is encompassing if it is a connected crossing-free geometric 
graph that contains all input segments as edges (without Steiner points). 

It is known that there does not always exists a Hamiltonian encompassing 
circuit (nor path) [21]. In fact, it is NP-complete to decide if a Hamiltonian 
encompassing circuit exists for a given set of segments, if the segments are al- 
lowed to intersect at their endpoints [17]. Among n disjoint segments in the 
plane there are always 6*(logn) for which an encompassing path exists [11], this 
number amounts to 0{^/n) if all segments are axis-parallel [20]. 

The maximum degree of an encompassing tree on the segment endpoints 
that is constrained to contain all input segments is, therefore, at least three. 
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After a preliminary upper bound of seven by Bose and Toussaint [6], Bose et 
al. [5] proved that an encompassing tree with maximum degree three always 
exists. Later Hoffmann and Toth [12] showed that there is also a Hamiltonian 
encompassing graph with maximum degree three. 

Pseudo-triangulations. Recently a relaxation of triangulations, called pseudo- 
triangulations, has received considerable attention. Here, faces are bounded by 
three concave chains, rather than by three line segments. More formally, a 
pseudo-triangle is a planar polygon that has exactly three convex vertices with 
internal angles less than tt. Pseudo-triangulations were originally studied for con- 
vex sets and for simple polygons because of their applications to visibility [15,16] 
and ray shooting [7,10]. But in the last few years they also found application in 
robot motion planning [19], kinetic collision detection [1,14], and guarding [18]. 

Of particular interest are the so-called minimum pseudo-triangulations, which 
have the minimum number of pseudo-triangular faces among all possible pseudo- 
triangulations of a given domain. They were introduced by Streinu [19], who 
proved that every minimum pseudo-triangulation of a set S' of n points consists of 
exactly n — 2 pseudo-triangles. Minimum pseudo-triangulations are also referred 
to as pointed pseudo-triangulations since every vertex v of a, minimum pseudo- 
triangulation has an incident region whose angle at v is greater than tt. 

Pseudo-triangulations, just like triangulations, are also crossing-free geomet- 
ric graphs. But while triangulations of a planar point set can have arbitrarily 
high vertex degree, there is always a pseudo-triangulation of vertex degree at 
most five [13]. Bounded vertex degree is a useful property for many applications, 
since it enables local operations or updates in constant time. 

Streinu [19] showed that every pointed geometric graph can be completed 
to a pointed pseudo-triangulation by greedily adding edges while maintaining 
pointedness. But this approach does not provide any guarantee regarding the 
vertex degree. On the other hand, pointed spanning trees are not omnipresent 
in planar structures: Aichholzer et al. [3] just established that there are trian- 
gulations which do not contain any pointed spanning tree. Furthermore, both 
the algorithm of Bose et al. [5] and that of Hoffmann and Toth [12] violate 
pointedness due to their proof techniques. 

Results. Here, we show how to construct an encompassing tree that respects 
pointedness and has maximum vertex degree at most three: 

Theorem 1. For any finite set of disjoint line segments in the plane there exists 
a pointed binary encompassing tree. 

Our proof is constructive: we describe a recursive algorithm that builds a binary 
pointed encompassing tree for n disjoint segments in 0(n‘*/^polylogn) time. 

Aichholzer et al. [2] showed that a bounded degree pseudo-triangulation con- 
strained to contain a Hamiltonian circuit (a simple polygon) always exists, with 
a degree bound of seven. With the help of a pointed binary encompassing tree we 
can extend these results to pseudo-triangulations constrained to contain disjoint 
line segments: 
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Theorem 2. For any finite set of disjoint line segments there is a pointed en- 
compassing pseudo-triangulation with maximum vertex degree at most ten. 



Organization. The next section presents the definition of a special class of poly- 
gons that we call necklaces. Section 3 provides an algorithmic overview and states 
the extensive set of invariants which we maintain during our construction. Sec- 
tion 4 gives the actual algorithm that constructs a necklace from the set of input 
segments. In Section 5, we prove Theorem 1 and sketch the runtime analysis 
of our algorithm. Finally, Section 6 shows how to combine the encompassing 
tree with the algorithm described in [2] to construct a pointed encompassing 
pseudo-triangulation with a maximum vertex degree of ten. 



2 Definitions and Basic Operations 

A polygon P is a sequence {pi,p 2 , . ■ . ,Pk) of points in the plane. Denote the set 
of vertices of P by V{P) = {pi,P 2 , ■ ■ ■ ,Pk}, and the set of edges by E{P) = 
{piP 2 ,P 2 P 3 , ■ ■ ■ ,Pk-iPk,PkPi}- Let dP denote the closed path pip 2 Op 2 P 3 U . . . U 
PkPi, and let degp(p) be the number of edges incident to a point p € V(P). A 
polygon is weakly simple if (z) any two edges are either disjoint or intersect in 
one endpoint, (ii) X)i=i ^Pi-i-iPiPi-i = {k — 2)7t, and {Hi) all edges incident to 
Pi are in the closed angular domain for i = 1, 2, . . . , fc. For a weakly 

simple polygon P, we define P as the closed polygonal domain enclosed by dP. 
We denote the interior of the polygonal domain P by int(P). A polygon P is 
simple if dP is a simple closed curve. A vertex pi of a polygon is convex {reflex) 
if Zpi^iPiPi_i is convex (reflex). A polygon is convex if all of its vertices are 
convex. Finally, an orientation u{P) of the vertices of a polygon P is a function 
u : P ^ {— 1,4-1}. In analogy to the notation for polygons, for a line segment 
s we use s to refer to it as a set of points in the plane. Similarly, for a set S of 
line segments let S' := {s | s G Sj. 

Definition 1. For a set S of segments in the plane, a necklace P is a weakly 
simple polygon such that 

— every vertex is an endpoint of a segment in S; 

— at every vertex, the incident edges and input segments are pointed; 

— the degree (w.r.t. P) of every vertex is two or four; 

— all segments of S are contained in P. 

An edge e G E{P) is called segment edge, if e G S, and visibility edge, otherwise. 
A segment pq £ S is saturated with respect to a necklace P, iff {p,q} C V{P). 
A vertex p £ V (P) is saturated, iff the incident segment from S is saturated. 

The graph of a weakly simple polygon is a tree of rings , which is a union of 
rings such that any two rings have at most one common vertex, and every cycle 
in the graph is one of the rings. If we represent every ring by a node and connect 
two nodes when the corresponding rings have a common vertex, then we obtain 
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Fig. 1. A necklace and its structure: a tree of rings with diagonals. 



a tree. Assuming that a necklace polygon has £ vertices of degree four, its graph 
is composed of £ + 1 rings. 

The segments from S which form internal diagonals of P, also referred to as 
segment diagonals, partition the rings into interior-disjoint sub-rings, any two of 
which are either disjoint or share a common segment edge. The tree structure 
of the rings (and sub-rings) allows to delete one edge from every sub-ring while 
maintaining connectivity. For this we color the edges of the necklace red or black 
in such a way that every sub-ring contains exactly one red visibility edge and 
every degree four vertex is incident to a red edge. Deleting all red edges from 
the necklace then yields a binary encompassing tree. 



2.1 Basic Operations 

The two basic operations used in our algorithms are based on geodesic curves. 
Roughly speaking, a geodesic curve between two points is a shortest curve from 
a specific class of curves that connect the points. 

Definition 2. Consider two distinct points p and q, and a finite set S of line 
segments in the plane. Let Pfp, q) he the set of all simple polygonal paths between 
p and q that do not cross any segment from S. For any g € F(p, q) denote by 
geo(g) the shortest curve from p to q that is homotopic to g within F{p, q). We 
say geo(g) is a geodesic curve between p and q with respect to S. 

Operation 1: Build_cap(P, u, i). See Fig. 2 for an example. 

Input: A necklace P = {p\,p 2 , . . . ,pfc), an orientation u{P), and an unsaturated 
convex vertex pi G V{P). Operation: Let qpi be the input segment incident to 
Pi. Replace the edge PiP^+u{pi) by the pa,th pi qU geo{q,p^,p^+u(pi))■ Set u{p) := 
ufpi) for every interior vertex of geo{q, Pi, Pi^u(pi)) including q. 

Operation 2: Extend_refiex(P, m, f, ry ). See Fig. 3 for an example. 

Input: A necklace P = {pi,p2, . . . ,Pk), an orientation u{P), a reflex vertex 
Pi of P, and a ray emanating from pi such that cuts Zpi+ipiPi-i into 
two convex angles and hits a segment ef G S with ef C int(P) at g G ef. 
Operation: Without loss of generality, suppose that Pi+u{pi) and / are on the 
same side of the supporting line of ri . Replace the edge PiPi+u{pi) of P by the 
path geo{pi,g,e) U (e, /) U geo{f, g,p^,Pi+uipi))■ Set u(-) := -u{p^) for every 
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Fig. 2. The result of Build_cap(P, n, i) for u(pi) 



+ 1 and u{pi) 



- 1 . 






Fig. 3. The result of Ext end .reflex (P, u,i,Yt). 



interior vertex of geo{pi, g,e) including e, and u{-) ;= u{pi) for every interior 
vertex of geo{f, g,pi,Pi+u(pi)) including /. 

Note that whenever an operation produces a new reflex vertex pa, the ori- 
entation is chosen so that the edge PaPa+u{pa) is a visibility edge of P. Then 
Extend_reflex replaces the visibility edge by an open polygonal chain. This guar- 
antees that all segments remain in P (as opposed to the operations in [12] where 
segments of S could become outer diagonals). It is not difficult to see that the 
two basic operations are necklace preserving, we omit the formal proof here. By 
an iterative application of Operation 1, we can make sure that every segment is 
either saturated or lies in the interior of the necklace. 

Both endpoints(P, u) 

Input: A necklace P = {pi,p2, ■ ■ ■ ,Pk) and an orientation u{P). Operation: 
Apply Build_cap(P, u, z), as long as there is an unsaturated vertex pi S V{P). 

3 Algorithmic Overview 

Starting from the convex hull of the segments, our algorithm greedily constructs 
a necklace that incorporates input segments as either edges or internal diagonals. 
When the greedy algorithm terminates it might happen that not all segments are 
included in the necklace. Therefore we maintain an extensive set of invariants — 
which are collected in Lemma 1 — that allow us to proceed by induction. By 
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maintaining a necklace P, we ensure that all remaining segments lie in the in- 
terior of P. We color the edges of P red or black, so that we can delete red 
edges later to reduce the maximum degree to three. As a next step we generate 
a convex partition T> of P. We can then apply induction in the interior of every 
convex piece. Finally we assign a vertex of P to every face D G 'D. At this vertex 
we connect the inductively computed encompassing tree for D to the necklace, 
while maintaining pointedness and low degree. 

The proof of the following key lemma, Lemma 1, based on an algorithm 
constructing a necklace polygon, is the subject of the next section. 

Lemma 1. For a set S of disjoint line segments, not all in a line, and a vertex 
X of conv{S) , there is a necklace P, a partition T> of P, an assignment t : V ^ 
V{P) \ {x}, and an edge coloring 7 : E{P) \ S' — >■ {red, black} satisfying the 
following properties. 

(LI) X is vertex of P such that degp(x) = 2 and x is incident to a red edge; 
(L2) every minimal cycle in E{P) U S contains exactly one red edge; 

(L3) for every D gT>, the point t{D) is a vertex of D; 

(Lj) for every p G V{P), the number of edges in E{P) U S incident to p plus 
the number of regions of T> assigned to p is less than or equal to 3 plus the 
number of red edges incident to p; 

(L5) every s G S is either saturated, or there is a D gT> such that s C int(I?); 
(L6) every polygon D gT> is convex; 

(L7) for every D gT> the edges and input segments incident to t{D) and D C\ S 
are on one side of a line through t{D); 

(L8) at most two regions, D\ and D 2 , are assigned to every point p G V \ {x}; 
and in this case D\ U Z ?2 is a simple polygonal domain with exactly one 
reflex vertex, that is at p. 



4 Constructing a Necklace 

We describe our algorithm that constructs a necklace P along with an edge col- 
oring 7 , a partition T>, and a vertex assignment t(-) for every region of T>. Prop- 
erties {L1)-{L4) are maintained all through the algorithm. We apply Build_cap 
and Extend_reflex repeatedly to ensure (L5). Finally, further partitioning of the 
non-convex regions in T> establishes {L6)-{L8). 

Both basic operations replace an edge pq G E{P) by a polygonal chain y. 
li pq is red, we need to color the edges of y carefully in order to maintain 
(Lj) at both p and q. Therefore, for every red edge, we label one endpoint as 
its anchor. Whenever a red edge pq with anchor p is replaced by a polygonal 
chain y, we color edges of y red such that one is anchored at p. For this, we 
have to make sure, though, that this edge of y is a visibility edge: In case of 
Build_cap(P, u, i), if PiPi+u{pi) is red then its anchor has to be at Pi^u(pi)j while 
for Extend_cap(P, M, i, rj), the edges of y incident to Pi and are both 

visibility edges. Notice that neither operation replaces a red edge pq if both p 
and q are saturated and are convex vertices of all their incident regions from T>. 




448 M. Hoffmann, B. Speckmann, and C.D. Toth 



Initialization. Let P := conv(S'). Label the vertices oi P by x = pi,P 2 , ■ ■ ■ ,Pm 
such that Pi p 2 ^ S', without loss of generality in anti-clockwise order. Let u{x) = 
— 1 and u{p) = -1-1 for every p G V{P) \ {x}. The segment diagonals of conv(S) 
partition P into convex polygons. Let these polygons form the initial set T>. For 
every D €T> let t{D) be the second vertex in the sequence pi,p 2 ■ • ■ iPm that is 
incident to D. Furthermore, color red the visibility edge from E(D) incident to 
t{D) that leads to the vertex with minimal index, and sets its anchor to t{D). 
All other edges of conv(S) are colored black. See Fig. 4 for an example. 




Fig. 4. An initial polygon. Red edges are dotted and point towards their anchor. 



The algorithm runs in two phases: First, we apply Both_endpoints and Ex- 
tend_reflex alternately, until (L5) is satisfied. The phase is guaranteed to termi- 
nate, since both operations increase \V{P)\. The second phase keeps the polygon 
P intact, but subdivides the non-convex regions of T> to ensure (L6). 

First phase. Apply Both_endpoints(P, rt). Then, as long as there is a reflex ver- 
tex Pi of P and a ray emanating from pi such that Vi partitions the angle 
Zpi+iPiPi-i into two convex angles and Vi hits a segment s G S', s C int(P): 
apply Extend_reflex(P, u, i, rt) followed by Both_endpoints(P, u). For a basic op- 
eration, modify T> as follows: replace every D gT> by the polygons Pi, P 2 , ■ • ■ , 
that enclose the connected components of (P fl int(P)) \ S. 

Suppose that a basic operation replaces an edge yz by an open polygonal 
chain x = iui = 2/, 2 / 2 , 2/3 • ■ ■ , ?/fc = z). Recall that x is a convex chain for 
Build_cap, and it consists of two convex chains connected by a segment edge 
in case of Extend_reflex x- Notice that each P^, i = 1,2, . . . ,£, has a common 
edge with x- Suppose w.l.o.g. that t(P) is a vertex of Pi, and let be the last 
vertex of x that is incident to Pi. For every polygon P^, 1 < f let at be 
the closest vertex and let bi be a second closest vertex of Pi to yu along x- Set 
t{Di) := bi, color edge Ui bi red, and set its anchor to a^. All other edges of x are 
colored black. See Fig. 3 for illustration. 

Second phase. For every reflex vertex pi of every non-convex polygon D G T>, let 
Ti be a ray emanating from pi that partitions the angle Zpi^iPiPi_i into two 
convex angles but does not hit any other vertex of P. By the end condition of 
the first phase, we know that does not hit any segment s G S' before reaching 
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Fig. 5. Partitioning a region D by rays emanating from reflex vertices. 



the boundary of D again. Consider first all such rays for which the source vertex 
is not assigned to any region of V. After all these rays have been processed, 
consider the remaining rays sequentially, and split D along into two new 
regions Di and Z? 2 - 

When splitting a region D G T> into two regions D\ and D 2 along a ray rt , 
update the assignment t as follows. Assuming, without loss of generality, that 
t{D) is a vertex of Di, let t{Di) := t{D) and t{D 2 ) := Pi- See Fig. 5. 

4.1 Proof of Lemma 1 

We have to show that the output of the algorithm described above satisfies all 
the properties of Lemma 1. Initially, P = conv(S') is clearly a necklace polygon. 
Since we modify the polygon by the basic operations only, P remains a necklace 
throughout the algorithm. 

It is easy to see that {L1)-{L4) hold after initialization. (Coincidentally, 
(L6)-{L8) also hold.) We need to argue that these four properties are maintained 
in the algorithm, and that the second phase additionally maintains (L5). In 
fact, it is enough to check that the updates associated with the basic operations 
maintain these properties. We point out two benefits of our basic operations. 

(i) At every reflex vertex p of P, the segment qp G S is an edge of P. 

(ii) No operation replaces the red edge incident to x in the initial polygon. 
The first Claim implies that (LI) is maintained: x is always incident to a red 
edge, degp(x) = 2 initially, and since a; is a vertex of the convex hull, no operation 
appends a path that would revisit x. Thus, its degree remains 2. 

For (L2), notice that all through the first phase, every region D G T> is 
bounded by edges of E{P) U S, so the minimal cycles corresponds to the polygons 
of V. Whenever a polygon D is replaced by the connected components of the 
region (Z? fl int(P)) \ S', one new edge is colored red in each newly created cycle 
of P(P) U S. The second phase does not change the structure of cycles. 

{L3) is satisfied at the initial polygon. The property is clearly maintained in 
both phases. 

{L4) holds initially: Every vertex of P = conv(S) has degree 2, and if a vertex 
is assigned to a region D gT>, it is assigned to one region only, and it is anchor 
of a red edge. 
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During the first phase, every minimal cycle of i?(P) U S' is a polygon of T>. 
Consider the moment where a vertex pi is assigned to a new region D G T>: the 
red edge piq oi D is placed incident to t{D) = pi, but its anchor is set to q. 
Vertex q is either endpoint of a segment diagonal or a degree four vertex of P. 
In any case, g is a convex vertex of both incident regions from P. Hence, there 
are only two ways to replace edge piq: by Build_cap(P, u, f), if pi is convex in 
D and Pi^u{pi) = <h or Extend_refiex(P, u, z, rj), if pi is reflex in D and there is 
a ray Vi from pi that hits a segment interior to D. In both cases, after such an 
operation pi is a convex vertex of P with degp(pi) = 2. Otherwise, the red edge 
Pi q stays incident to pi and thus compensates for the assignment t{D) = pi. 

Apart from the region assignment, there are vertices of degree two and four in 
P. If degp(p) = 2, then (L4) holds, since p can be incident to at most three edges 
of E{P)US. If degp(p) = 4, then p appears as a reflex vertex in P. By Claim (i), 
the input segment incident to p is in E{P). That is, p is incident to 4 edges in 
E{P) U S. In our operations, the degree of p can go up to 4 only if a geodesic 
curve geo(g) passes through a reflex vertex at p. In this case, {D 0 int(P))\S' has 
two disjoint components incident to p. The vertex p G geo(p) is the closest vertex 
from one of the regions, say Dp to the region Di containing t{D). Therefore, a 
visibility edge pq incident to p in Di is colored red and anchored at p. Moreover, 
t{Di) is set to q, that is, our argument here conforms with our reasoning above 
regarding the region assignments. In particular, if there are two adjacent degree 
four vertices along a geodesic curve, one of them is incident to two red edges. 
Altogether we have shown that (L4) holds during the first phase. 

In the second phase, E{P) U S does not change. When splitting regions of 
V, some of the regions are assigned to degree two reflex vertices of P. Since two 
edges of E{P) U S are incident to such a reflex vertex p, (L4) clearly holds for 
p if at most one region is assigned to p. Assume that after a split D — >• Di, D 2 , 
two regions are assigned to a reflex vertex p. This can only happen if we split 
along a ray emanating from p and t{D) = t{Di) = t{D 2 ) = p. We have argued 
above that p is incident to a red edge in this case, which compensates for the 
additional region assigned to p. Thus, {L4) holds during and after the second 
phase as well. 

(L5) is end condition of the subroutine Both_endpoints. Therefore it is sat- 
isfied at the end of the first phase. Obviously the second phase maintains (L5) 
as well, since the polygon P remains unchanged. 

(L6) clearly holds at the end of the algorithm, since the second phase elimi- 
nates all reflex vertices of polygons from D. 

(L7) holds trivially for every t{D), D gT>, which is a convex vertex of P. If 
t{D) = Pi is a reflex vertex of P, consider the inverse wedge W {pi) of its exterior 
angle By the end condition of the first phase, W{pi) fl int(D) fl S' = 

0, which proves {L7). 

Finally, we have to consider {L8). Note that this property is void all through 
the first phase, since every vertex is assigned to at most one region. But during 
the second phase, we might split a region D G V into two regions D\ and D 2 
along a ray r emanating from p, such that afterwards t{D\) = t{D 2 ) = p. But 
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this can happen only if t(D) = p, that is, for at most one reflex vertex of every 
region D. Since the algorithm treats these vertices after all other reflex vertices to 
which no region is assigned to, {L8) holds. This completes the proof of Lemma 1. 

5 Constructing a Pointed Binary Encompassing Tree 

In this section we use Lemma 1 to give an inductive proof of Theorem 1 . For the 
sake the inductive argument, we actually prove a stronger theorem: 

Theorem 3. For any finite set S of disjoint line segments in the plane and any 
vertex X of comr{S), there exists a pointed binary encompassing tree T such that 
deg7.(a;) < 3. 

Proof. We proceed by induction on n := |S'|. If all segments are collinear, then 
there is a Hamiltonian encompassing path along this line. In the general case 
consider a necklace P as claimed in Lemma 1. 

If every input segment is either an edge or a diagonal of P, then the black 
edges of P together with all segment diagonals form an encompassing tree T: 
The graph is connected because of (L2), and pointed because P is a necklace. 
Moreover, the number of black edges and segment diagonals incident to any 
vertex p S P in P is at most three by (Lf). 

Otherwise, there is a segment s G S that is neither an edge nor a diagonal of 
P. By (L5), s lies in the interior of a convex polygon D G 'D. For every P C P, 
let S' n P denote the set of all segments from S which lie in the interior of 
By induction, there is a binary encompassing tree Tj^{x) for S 0 P 
and any vertex x of conv(S fl P) such that deg-jy(,,,)(a:) < 3. Consider a region 

dgv with s n pi := S n {Pl} 0. 

If vertex p = t{D) is not assigned to any region other than D, add a tan- 
gent pxD from p to conv(S fl D), and add T{jj^{xd) to the tree T. Vertex 
a;£) is pointed in the resulting tree, since pxo is tangent to conv(S IT D), and 
deg 7 ^(a;D) < 3 because < 3. Vertex p remains pointed by (P7), 

and its degree is at most three by {Lf). 

It remains to consider the case that p = t{Di) = t(P 2 ) is assigned to two 
different regions Di and D 2 of T>. If one of S' T Pi or S' T P 2 is empty, we can 
proceed as above. Hence, assume that both S T Pi and S T P 2 are non-empty. 
We distinguish two cases. 

Case 1: There is a line ^ through p such that all edges incident to p in T and 
part of both S T Pi and S T P 2 are on one side of ^ (Fig. 6(a)). According to 
(P7) there are two tangents, t\ = px\ from p to conv(S T Pi), and t 2 = px 2 
from p to conv(S T P 2 ), such that p remains pointed if both t\ and t 2 are added 
to T. As above, we also add T^jj^'^{xi) and P{D 2 }( 2 I 2 ) to T and observe that all 
vertices involved satisfy the required degree and pointedness conditions. 

Case 2: There is a line £ through p such that both S' T Pi and S T P 2 are 
on one side of £ and all edges incident to p in P are on the opposite side of 
£ (Fig. 6(b)). In this case conv(S T {Pi, P 2 }) C int(Pi UP 2 ) follows from (L8). 
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Add a tangent px from p to conv(S' fl {Di,D2}) and T^Di,D2}{^) to T. The 
degree of p in the resulting tree is at most two by (L 4 ), and therefore p is also 
pointed. 





(a) Case 1. 



(b) Case 2. 



Fig. 6. Attaching encompassing trees to p = t{Di) = t{D 2 ). 



After considering every region D G D, the black edges of the necklace P, the 
input segments, the tangents, and the encompassing trees for the subproblems, 
together form a pointed encompassing tree of maximum degree three. Since 
vertex x is not assigned to any region from T>, it has maximum degree two 
according to {LI). □ 

Analysis. The time complexity of our algorithm depends on the best available 
algorithms for the operations we use. We assume that our polygons are repre- 
sented in a doubly connected edge list, and so we can easily replace an edge by 
a polygonal path or retract simple subpolygons of a necklace. The maintenance 
of the partition T>, and the labels t and 7 require linear time and space. We also 
maintain a flag for every segment and vertex about their saturation status. 

We use ray shooting queries in operations in both phases. Every ray is an 
extension of an input line segment. In the first phase we apply Extend-reflex if 
an extension of a line segment incident to a reflex vertex pi hits another line 
segment lying in the interior of the face D G T> incident to pi. An O(nlogn) 
time sweep-line algorithm can precompute all pairs of segments such that the 
extension of one hits the other. We also maintain a dynamic point location data 
structure [8] for the polygons in T> with 0(log n) update time and 0(log^ n) query 
time. For every ray , we look up the first segment Si hit from the precomputed 
list, and if Si is unsaturated, then a point location query for n Si tells whether 
Si lies in the face incident to pi. The total time spent on ray-shooting related 
operations in all first phases throughout the algorithm amounts to O(nlog^n). 

In the second phase, we partition every non-convex face D G T> along rays em- 
anating from reflex vertices. Two line-sweep algorithms (left-to-right and right-to 
left) complete this partition in 0(n£) log n^i) time where njj denotes the size of 
D. Thus the total runtime of all second phases throughout the algorithm is 
0(nlog n). 
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Computation of geodesic paths, known as shortest path homotopic to an input 
path, was in the focus of recent studies [9,4]. The convex hull of segments lying 
in the interior of a polygon D G T> can also be computed by such a query. The 
total size of all geodesic paths is 0(n) in our algorithm, since every vertex on 
a geodesic path becomes saturated once only. Moreover, all our query paths are 
simple and pairwise non-crossing, and their total size is 0{n). Bespamyatnikh [4] 
can compute shortest homotopic paths in the presence of n point obstacles in 
0(n^/^polylog n) time if the total size of both the input and the output is linear. 
(Note that if the query paths were given in advance, then the geodesic paths 
could be computed in 0(n polylog n) time [4].) 

Theorem 4. A pointed binary encompassing tree for n disjoint segments in the 
plane can be computed in poly log n) time. 



6 Bounded Degree Pseudo-Triangulations for Segments 

A careful analysis reveals that Theorem 1 holds in the following form. 

Theorem 5. For any finite set of disjoint line segments in the plane there exists 
a pointed binary encompassing tree such that the maximum degree is at most three 
and if a convex hull vertex has degree three then at least one of its incident edges 
is part of the convex hull. 

We combine this theorem with the algorithm of Aichholzer et al. [2], according 
to which a simple polygon can be pseudo-triangulated such that the degree of 
every convex vertex is at most four and the degree of every reflex vertex is at 
most five, that is, every convex (reflex) vertex has at most two (three) new 
incident edges in addition to the two incident polygon edges. 

We apply this result to each polygon into which the tree of Thm. 5 dissects 
P and obtain an upper bound of 10. On the other hand, there is a lower 
bound construction that consists of ten disjoint segments and forces a vertex 
of degree at least six in every encompassing pseudo-triangulation (cf. the full 
version of [2]). 
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Abstract. Given a hypergraph H = (K-^) a-nd a [0, l]-valued vector 
a G [0, l]'^, its global rounding is a binary (i.e.,{0, l}-valued) vector 
a G {0, such that | — a(w))| < 1 holds for each F £ T. 

We study geometric (or combinatorial) structure of the set of global 
roundings of a using the notion of compatible set with respect to the 
discrepancy distance. We conjecture that the set of global roundings 
forms a simplex if the hypergraph satisfies “shortest-path” axioms, and 
prove it for some special cases including some geometric range spaces 
and the shortest path hypergraph of a series-parallel graph. 



1 Introduction 

Rounding problem is a central problem in computer science and computer engi- 
neering. Given a real number a, its rounding is either its floor [aj or ceiling [a] . 
Then, we want to consider how to round a set of n real numbers each of which is 
assigned to an element of a set V = {vi,V2, • ■ • , u„} with a given structure. We 
can assume that each number is in the range [0, 1], so that the input set can be 
considered as a G [0, 1]^ and the output rounding is a G {0, 1}'^. Throughout 
this paper, we use a Greek (resp. bold) character for representing a binary (resp. 
real-valued ) function on V. 

We assume that the structure on V is represented by a hypergraph H = 
{V,T) where iF C 2^ is the set of hyperedges. For simplicity, we assume without 
loss of generality that T contains all the singletons. We say a is a global rounding 
ofaiffu;F(a) = J2vsf a{v) is a rounding (i.e., either floor or ceiling) of wf{sl) = 
F & T. Let r-u{a) be the set of all global roundings of a. 
We can rephrase the global rounding condition as I?^(a, a) < 1, where Dpi 
is the discrepancy distance between a and b in [0, 1]^ defined by 

Dpi{a, b) = max |wF(a) — WF(b)|. 

F^J- 

T. Hagerup and J. Katajainen (Eds.): SWAT 2004, LNCS 3111, pp. 455—467, 2004. 
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Thus, r-u{a.) is the set of integral points in the open unit ball about a by 
considering D-u as the distance. r-u{a.) yf 0 for every a iff "H is unimodu- 
lar (i.e. the incidence matrix is totally unimodular) [5]. However, except for 
the above unimodular condition for the nonemptiness and some results on 
its cardinality, little is known on the structure of F-uia)- We remark that 
supagfo i]v minc,g{o,i}v T^w(a, a) is the linear discrepancy of "H, and considered 
as a key concept in hypergraph theory and combinatorial geometry [5,7,11]. 

In this paper, we study the geometric property of F-^{a). We say that a 
hypergraph H has the simplex property if /^(a) is an affine independent set 
regarded as a set of n-dimensional vectors for any a G [0,1]'^. If we regard 
Ffi(a) as a set of points, the simplex property means that the set forms the 
vertex set of a (possibly lower dimensional) simplex if it is nonempty. Our main 
aim is to investigate classes of hypergraphs that have the simplex property. The 
global rounding condition is directly written in an integer programming formula, 
and thus from the viewpoint of mathematical programming, we have interesting 
classes of integer programming problems for which the solution space is a simplex 
while the corresponding LP polytope is not always a simplex. 

The simplex property is motivated by recent results on the maximum number 
/r(?{) = maXag[o,i]v \r-u{a)\ of global roundings. ^(77) can never become less 
than n + 1 for any hypergraph since n unit vectors and the zero vector always 
form Ffiia) for a suitable a. In general, /i(i7) may become exponential in n. 
However, Sadakane et a7[13] discovered that /r(I„) = n + 1 where is the 
hypergraph on P = {1, 2, .., n} with the edge set {[i,j]', 1 < t < J < n} consisting 
of all subintervals of V . A corresponding global rounding is called a sequence 
rounding, which is a convenient tool in digitization of a sequence analogue data. 

Given this discovery, it is natural to ask for which class of hypergraphs the 
property /r('H) = n + I holds. Moreover, there should be combinatorial (or geo- 
metric) reasoning why /i('H) = n-l- 1 holds for those hypergraphs. Naturally, the 
simplex property implies that /r("H) = n + 1 since a d-dimensional simplex has 
d + I vertices, and indeed has the simplex property. 



Shortest-Path Hypergraphs and Range Spaces 

Fn has n(n-|-l)/2 hyperedges, and the authors do not know any hypergraph with 
less than n(n -I- l)/2 hyperedges (including n singletons) that has the simplex 
property. Thus, it is reasonable to consider some natural classes of hypergraphs 
with n(n -I- l)/2 hyperedges. 

Consider a connected graph G = {V, E) in which each edge e has a positive 
length ^{e). We fix a total ordering on V, and write V = {vi,V2, ■ ■ ■ ,Vn}- This 
ordering is inherited to any subset of V. For each pair (vi,Vj) of vertices in 
V such that i < j, let p{vi,Vj) be the shortest path between them. If there 
are more than one shortest paths between them, we consider the lexicographic 
ordering among the paths induced from the ordering on V, and select the one 
with the first one in this ordering. Let P{vi, Vj) be the set of vertices on p{vi, Vj) 
including the terminal nodes Vi and Vj. We also define P{v,v) = {u} for each 
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V G V. Let T{G) = {P{vi,Vj) '■ I < i < j < n}, and call "H(G) = (V,P{G)) the 
shortest-path hypergraph associated with G. 

Conjecture 1 (Weaker conjecture) . /i("H(G)) = n + 1 if G is a connected graph 
with n vertices. 

The conjecture was given in [1] and has been proved for for trees, cycles, 
and outerplanar graphs [1,14]. Note that TL{G) = if G is a path. However, 
those proofs are complicated and case dependent. We try to establish a more 
structured theory considering the following deeper conjecture: 

Conjecture H (Stronger conjecture). For any connected graph G, the shortest 
path hypergraph "H(G) has the simplex property. 

This conjecture was proposed in [1] by the authors where the simplex prop- 
erty was called “affine independence property” . So far, the conjecture has been 
proved only for trees, unweighted complete graphs, and unweighted (square) 
meshes. We prove that the simplex property is invariant under some graph- 
theoretic connection operations, and as a consequence, we show that the conjec- 
ture holds for series-parallel graphs. 

In addition to significantly extending the verified classes of hypergraphs for 
both of weaker and stronger conjectures, our theory also simplifies the proofs of 
known results. For example, that weaker conjecture holds for cycles is one of the 
main results of [1] and its proof therein is quite involved. In our framework, it 
is almost trivial that the stronger conjecture holds for cycles (see Section 3). 

From a computational-geometric viewpoint, can be considered as the 1- 
dimensional range space corresponding to intervals, and thus we try to extend 
the theory to geometric range spaces. We generalize the argument for TL{G) 
to axiomatic shortest-path hypergraphs (defined later), and prove the simplex 
property for some geometric range spaces such as the space of isothetic right- 
angle triangles. 



Algorithmic Implication 

The theory is not only combinatorially interesting but is applied to algorithm 
design on the rounding problems. In general, given a system of functions and a 
real input vector a, we seek for an integer output vector a such that difference 
between values of each function at a and a is less than a given threshold. The 
algorithmic question of how to obtain a low-discrepancy rounding of given a is 
the case where each function is a binary-coefficient linear function associated 
with a hyperedge, and important in several applications. 

For example, consider the problem of digital halftoning in image processing, 
where the gray-scale value of each pixel has to be rounded into a binary value. 
This problem is formulated as that of obtaining a low-discrepancy rounding, in 
which the hypergraph corresponds to a family of certain local sets of pixels, and 
several methods have been proposed[2,3,12]. Unfortunately, for a general hyper- 
graph, it is NP-complete to decide whether a given input a has a global rounding 
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or not, and hence it is NP-hard to compute a rounding with the minimum dis- 
crepancy ([3]). Thus, a practical approach is to consider a special hypergraph 
for which we can compute a low-discrepancy rounding efficiently. 

It is folklore that the unimodularity condition means that the vertices of 
the ball (w.r.t. D-u) are integral, and an LP solution automatically gives an IP 
solution. Thus, a global rounding always exists and can be computed in polyno- 
mial time if "H is unimodular, and therefore in the literature [2,3,8] unimodular 
hypergraphs are mainly considered. 

Here, we consider another case where an integer programming problem can be 
solved in polynomial time: If the number of integral points in the solution space 
is small (i.e. of polynomial size), and there is an enumeration algorithm that is 
polynomial in the output size (together with the input size), we can solve the 
problem in polynomial time. We show that enumeration of all global roundings 
can be done in polynomial time for several (non-unimodular) hypergraphs with 
the simplex property by applying this framework. 



2 Combinatorial and Linear Algebraic Tools 

2.1 Compatible Set Representing Global Roundings 

The set of binary functions on V can be regarded as the n-dimensional hypercube 
Cn = {0, 1}", where n = \V\. Consider an integer-valued distance / on C„. We 
call a subset A of Cn & compatible set with respect to / if f{x, y) = 1 for any 
pair X y of A. In other words, A is a compatible set if, and only if it is a unit 
diameter set. Property of a compatible set is highly dependent on /: If / is the 
Loo distance, the hypercube itself is a compatible set, while the cardinality of 
a compatible set for the Hamming distance is at most two. By definition, 
gives an integer- valued distance on the hypercube C„. 

Definition 1. A set of binary functions on V is called H-compatible if it is a 
compatible set with respect to D-f^. In other words, \wp{a) — WFifd)\ < 1 holds 
for every hyperedge F of H for any elements a and (3 of the set. 

F-u{sl) is always an "H-compatible set, since the D-u distance between two 
global roundings must be integral and less than 2. Conversely, any maximal H- 
compatible set is L-H(g), where g is the center of gravity of the compatible set. 
Thus, it suffices to show the simplex property for compatible sets instead of sets 
of global roundings. 



2.2 General Results on Simplex Property 

It is obvious that the simplex property is monotone, that is. 

Lemma 1. If H = {V,fF) has the simplex property and T d T' then so does 

H' = {v,r). 
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Recall that a set ^ = {ai,a2 , . . . ,am} of vectors is affine dependent if, and 
only if there are real numbers ci, C2, . • . , Cm satisfying (i) at least one of them is 
non-zero, (ii) Ei<*<mG = 0, and (hi) Ei<i<mOa* = 0. 

A set A of binary assignments on V is called minimal affine dependent if it is 
an affine dependent set as a set of vectors in the n-dimensional real vector space 
{n = |y|) and every proper subset of it is affine independent. 

For a binary assignment a on V and a subset X of V, a\x denotes the 
restriction (projection) of a on X . Given a set A of binary assignments on V, 
its restriction to X is A\x = {a\x ■ a G A}. Note that the set is not a multi-set, 
and we only keep a single copy even if a\x = /3\x for different a and (3 in A. 
For binary assignments a on A and (3 on Y a © /3 is a binary assignment on 

V = X LI Y obtained by concatenating a and f3: That is, {a © P){v) = a{v) if 

V £ X, and (a © f3){v) = (3{v) ii v £ Y . By definition, a © /3 is only defined if 
a{v) = !3{v) for each v £ X flY . The following is our key lemma: ^ 

Lemma 2. Let A he a minimal affine dependent set on V, and let V = XUY. If 
A\x and A\y are affine independent, then A\xnY contains only one assignment, 
as a set. 

Proof. Since A is affine dependent, there exists a constant c{a) for each a £ A 
such that EqsA o(o) = 0 and EqsA c{a)a = 0, and at least one c{a) is nonzero. 
We consider projection of these formulae to X to have formulae E/3 gA|x “ 
0 and E/3 gA|x = EaeA, q|x=/3 Because of affine 

independence of A\x, C{(3) = 0 for each /3 £ A\x- Let us consider r £ A\xnY- 
Let A{t) = {a £ a : a\xnY = t}, and Ax{t) = {(3 £ A\x : P\xnY = r}. We 
select r such that there exists a G A{t) satisfying c(a) ^ 0. 

Let t] = EaGA(r) c(a)a- Then, p\x = E/3 gAx(t) i* is 0 since 

C{/3) = 0 for each /3. Similarly t]\y = 0. Thus, rj = 0. Moreover, Eq,gA(t) ~ 
J2f3eAx{T) — f*- This means that A{t) is affine dependent. Because of min- 
imality oi A, A = A{t), and we have the lemma. □ 

Given a subset S of V, we can consider the induced hypergraph His = 
{S,J^ n 2'S’). Note that each hyperedge of His must be a hyperedge of H by 
definition. Naturally, if a set A of binary assignments on V is "H-compatible, 
then Ajs is "H| 5-compatible. 

By definition, a subset of a compatible set is also a compatible set. Thus, the 
concept of minimal affine dependent compatible set (possibly an empty set) is 
well defined. We have the following corollary of Lemma 2: 

Corollary 1. Consider a hypergraph H = {V, T) and a minimal affine depen- 
dent compatible set A. Suppose that V = X UY and each ofH\x and T-L\y has 
the simplex property. Then, for any pair a and a' in A, we have a(v) = a'(v) 
for each v £ X C\Y . 



^ This lemma is purely linear-algebraic, and can be extended to hold for a general set 
of real vectors. 
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Definition 2. A vertex v of a hypergraph H is called a double-covered vertex if 
there exist suitable subsets X and Y such that V = X UY, v & X C\Y , and both 
ofTL\x and T-L\y have the simplex property. We say S C V is double-covered if 
every element of S is double- covered. 



Definition 3. For a subset S of vertices of a hypergraph H = {V,T), a set A of 
assignments on V is called S-contracted if a{v) = 0 for each v € S and a G A. 



Theorem 1. LetFL = {V,T) be a hypergraph, and let S CV be a double-covered 
set. Then, if every S-contracted compatible set is affine independent, TL has the 
simplex property. 

Proof. Assume on the contrary that TL does not have the simplex property. 
Thus, we have an affine dependent compatible set, and hence have a minimal 
affine dependent compatible set A. From Corollary 1, we can assume that all 
assignments of A take the same value on each element of S. Thus, if we replace 
the value to 0 at every v G S, the revised set A is also compatible and minimal 
affine dependent, since we subtract the same vector from each member of A to 
obtain A. However, A is S'-contracted, and hence contradicts the hypothesis. □ 



Corollary 2. IfV itself is double-covered, TL = {V,LF) has the simplex property. 



2.3 Axiomatic Shortest Path Hypergraph 

Definition 4. A hypergraph TL = {V,LF) is called an ASP (axiomatic shortest 
path) hypergraph if LF = {f{u,v)\u,v GV x V} satisfies the following conditions: 

(1) : f{u,u) = {u}. 

(2) : f{u,v) = f{u',v') if, and only if {u,v} = {u' ,v'} as unordered pairs (one- 
to-one property). 

(3) : For any s,t G f{u,v), f{s,t) C f{u,v) (monotonicity). 

It is clear that the shortest-path hypergraph TL{G) becomes an ASP hyper- 
graph for any connected graph G with any edge-length function. 

Definition 5. Given an ASP hypergraph TL = {V,LF), a subset S of V is called 
a shortest-path-closed subset (SPC subset) if f{u,v) C S for any pair u and v 
in S. 

The following lemma is immediate from definitions. 

Lemma 3. Given an ASP hypergraph TL = {V, LF) and an SPC subset S of V, 
TL\s is also an ASP hypergraph. 
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3 Shortest Path Hypergraphs with the Simplex Property 

Definition 6 . A subgraph G' = {V',E') of G = (V,E) is called an SPC sub- 
graph if any shortest path in G' is a shortest path in G. 

The following two lemmas are immediate from definitions: 

Lemma 4 . If G' = {V',E') is an SPC subgraph of G = {V,E), then V is an 
SPC subset ofV with respect to T~L{G), and H{G)\v' = T~L{G'). 

Lemma 5 . Consider % — 'H{G) for G = (V,E). Let G\ = {V\,Ei) and G2 = 
(V2j£'2) be SPC subgraphs such f/iaf V1UV2 = V. If both'H{Gi) and'H{G2) have 
the simplex property, then each vertex in V\ fl V2 is double-covered. 

Proposition 1 . If G is a cycle, T-L{G) has the simplex property. 

Proof. We give a cyclic ordering v\,V2, . . ■ ,Vn of the vertices. For the vertex 
vi, let Vi = {vi,V2,V3, . ..,Vk} and V2 = {vk+i,Vk+2, ■ ■ ■ ,Vn,vi} where k is the 
largest index for which the shortest path from vi to Vk goes through V2. Let 
Gi and G2 are induced subgraphs associated with Vi and V2, respectively. Since 
Gi and G2 are paths, it is known [1] that "H(Gi) and 'H{G2) have the simplex 
property. It is clear the Gi and G2 are SPC subgraphs, and Vi fl V2 = and 
Pi U V2 = y ■ Thus, from Lemma 5, v\ is double covered. This argument holds 
for any cyclic ordering, and thus every vertex of V is double-covered. Thus, from 
Corollary 2, 'H(G) has the simplex property. □ 

Definition 7 . A graph G = {V, E) is a series connection of two subgraphs Gi = 
{Vi,Ei) and G2 = (P2)L'2) if there exists a vertex (cut vertex) v such that 
V = ViU V2, Pi n V2 = {^^}; and EiL)E2 = E. 



Theorem 2 ([!]). Let G be a series connection of two connected graphs G\ and 
G2. If both "H(Gi) and T-L{G2) have the simplex property, then so does H{G). 

Definition 8 . A connected graph G = (P, E) has a 3 -parallel decomposition 
if there exist two vertices u and v such that G is decomposed into nonempty 
connected graphs Gi = (Pi,i?i), G2 = (V2,E2), and G3 = (VsjE^) such that ( 1 ) 
V = ViUV2U P3, ( 2 ) Vi n P2 = P2 n P3 = Pi n P3 = {u, u}, and ( 3 ) E is the 
disjoint union of E\, E2, and E3. (see Fig. 1 ). 

Consider a family F of connected graphs, and assume that it is closed under 
the subgraph operation; that is, any connected subgraph of G G is also in F. 
A graph G G is a minimal counterexample for the simplex property in F if 
"H(G) does not satisfy the simplex property but "H(G') has the simplex property 
for every connected subgraph G' of G. 

Theorem 3 . A minimal counterexample G for the simplex property in F is 2 - 
connected, and does not have a 3 -parallel decomposition. 
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Proof. 2 -connectivity follows from Theorem 2 . Thus, we assume that G has a 
3 -parallel decomposition at u and v, and derive a contradiction. We define the 
following three subgraphs of G: G(i^2) is the union of Gi and G2, G(i_3) is the 
union of Gi and G3, and G(^2,3) is the union of G2 and G3. These graphs are 
connected and hence satisfy the simplex property because of the minimality of 
G. By symmetry, we can assume that the shortest path between u and v is in 
Gi- Then, both G(i^2) and G(i,3) are SPC subgraphs. Thus, 'H{G)\v(g^i 2)) ~ 
'^(G(i, 2)) and 'H{G)\v(G(^i 3^) = '^(G(i,3)), where P(G(jj)) is the vertex set of 
G(ij). Thus, it is clear that each vertex of Gi is double-covered. 

A vertex x in V{G(^2,3)) is called biased if either the shortest path in G from 
a; to M goes through v or the shortest path from x to v goes through u. We claim 
that a biased vertex is double-covered. Without loss of generality, we assume 
that a; is a vertex of G2 and the shortest path p from x to v goes through u. 
Then, any vertex of G2 on p is also biased, and G(i^3) U p is an SPC subgraph. 
Thus, V is in the intersection of two SPC subgraphs G(i_3) U p and G(i^2)> and 
hence double-covered. Thus, S = V{Gi) U B is double-covered, where B is the 
set of all biased vertices. 

Now, we are ready to apply Theorem 1 . Consider an arbitrary S'-contracted 
compatible set A of "H(G). We claim that A is also 'H{G(2,3)) compatible. If 
this claim is true, A must be affine independent (since 'H(G(2,3)) has the sim- 
plex property), and we can conclude that "H(G) has the simplex property from 
Theorem 1 , so that we have contradiction. 

We give a proof for the claim: Let a and (3 be any two members of A. Consider 
any shortest path p of G(2,3). Let x and y be endpoints of p, and let P be the 
vertex set of the path. It suffices to show the compatibility \a{P) — P{P)\ < 1 . 

If all the vertices on p are in S, a{P) = ( 3 {P) = 0 , and the compatibility 
condition is trivial. Thus, we assume there exist vertices in P \ S' on p. Let xq 



U 




Fig. 1. 3 - parallel decomposition of G. 
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be the nearest vertex in {V \ S) (1 P to x. The subpath po of p between xq and 
y is the shortest path in G( 2 , 3 ) between them. Let Pq be the set of vertices on it. 

Consider the shortest path q with respect to G between xq and y. If it 
contains both u and v on it, either the shortest path between xq and u contains 
V or that between xq and v contains u. Thus, xq must be biased, and hence in 
S, contradicting our hypothesis. Therefore, without loss of generality, we can 
assume q does not contain v. This means that q contains no vertex of Gi \ {u}, 
since otherwise q must go through u twice. Thus, q is in G(2,s) , and hence q = Po 
since the shortest path between given two vertices is unique in our definition of 
the shortest path hypergraph. Thus, from the compatibility on a shortest path 
of G, |a(Po) — l3{Po)\ < 1- Since assignment on each vertex of S' is 0 for each of a 
and [3, we have the compatibility \a{P) — (3{P)\ < 1 on p. Thus, A is 'H(G( 2 , 3 )) 
compatible, and we have the claim. □ 

Thus, the simplex property holds for a graph that is constructed by applying a 
series of 3-parallel connections and series connections from pieces (such as paths, 
cycles, unit edge-length complete graphs, and unit edge-length meshes) for which 
the simplex property is known to hold. We give a typical example in the following: 
A graph is series-parallel if it does not have a subdivision of the complete graph 
K 4 as its subgraph. Here, a subdivision of a graph is obtained by replacing edges 
of the original graph with chains. A connected graph is outerplanar if, and only 
if it has a planar drawing in which every vertex lies on the outerface boundary. 
An edge that is not on the outerface boundary is called a chord. A series parallel 
graph is planar, and an outerplanar graph is series-parallel. 

Theorem 4. If G is series-parallel, 'H(G) has the simplex property. 

Proof. Clearly, the family of connected series-parallel graphs is closed under 
the subgraph operation, and we consider its minimal counterexample G. By 
Theorem 3, G is 2-connected. If G is not outerplanar, G has a vertex v in the 
interior of the outerface cycle G. Since G is 2-connected, v is connected to at 
least two vertices of G without using edges on G. If u is connected to three 
vertices of G, the union of these paths and G contains a subdivision of K 4 , 
and we have contradiction. Thus, v is connected to exactly two vertices Ui and 
U 2 of G, and we have 3-parallel decomposition at ui and U 2 . Thus, G must be 
outerplanar. If G has a chord, G has 3-parallel decomposition at the end vertices 
of the chord. Thus, G does not have a chord. However, a 2-connected outerplanar 
graph without a chord must be a cycle, and we have already shown the simplex 
property for cycles. Thus, we have the theorem. □ 

As a corollary, we have /r(G) = n -I- 1 for a series-parallel graph, extending 
the result for an outerplanar graph given in [14]. 

Moreover, it can be observed that any (non-cycle) 2-connected series parallel 
graph has a 3-parallel decomposition in which two of the components are paths 
from a classification of substructures of series parallel graphs given by Juvan et 
al. [10] (also see [15]). Using this observation and the argument given in [14] for 
outerplanar graphs, we have the following (we omit details in this version): 
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Theorem 5. We can enumerate all the global roundings of an input a for the 
shortest path hypergraph of a series-parallel graph with n vertices in 0{n^) time. 

4 Geometric Problems 

We consider some geometric hypergraphs that are ASP hypergraphs. Consider 
a set V of n points on a plane. For each pair u = (xu,yu) and v = (xv,yv) of 
points, uv is the line segment connecting them. Let B(u,v) be the region below 
the segment uv; that is, B{u,v) = {{x,y)\x G [xu,Xy],y - yu < xl-lZ 
if Xu yf Xy. If Xu = Xy, we define B{u,v) = uv. Let R{u,v) be the closed 
isothetic rectangle which has u and v in its diagonal position, and let T{u,v) = 
B{u,v) n R{u,v) be the lower right-angle isothetic triangle which has uv as its 
longest boundary edge. We define T{u,u) = R{u,u) = B{u,u) = uu = {m}. 

We consider hypergraphs S = {V,{V C\uv : u,v G V}), B = {V,{V (lB{u,v) : 
u,v G V}), TZ = {V, {V n R{u, v) : u,v G F}), and T = (F, {F fl T(u, v) : u,v G 
V}). See Fig. 2 to get intuition. 

They are typical examples of range spaces. B becomes the hypergraph 
consisting of all intervals if the point set is convex (i.e., it is on the lower chain 
of the convex hull of itself) and arranged with respect to the x-coordinate values. 
Thus, the global rounding for is a natural extension of the I„-global rounding 
(sequence rounding). We also remark that S corresponds to the stabbed sets by 
segments, and equals /C„ = (F, F x F) if the point set is in general position. 

Lemma 6. S, B, and T are ASP hypergraphs for any point set V. TZ is an 
ASP hypergraph if there are no four points ofV forming corners of an isothetic 
rectangle. 

4.1 Simplex Property of Range Spaces 

Theorem 6. Each of B, T and S have the simplex property. If there are no 
four points of V forming corners of an isothetic rectangle, TZ has the simplex 
property. 




Fig. 2. B{u,v) (left) andT(«, w) (right). 





On Geometric Structure of Global Roundings for Graphs and Range Spaces 465 



Proof. We prove the simplex property by induction on the number M of 
horizontal lines and the number N of vertical lines on which V lies. We only 
deal with T here because of space limitation. If iV = 1 or M = 1, the problem 
is reduced to the sequence rounding problem. If iV = M = 2, we can prove 
directly. Thus, we assume that V lies on M > 2 horizontal lines and also lies on 
N > 2 vertical lines, and the statement holds if the point set lies on less than M 
horizontal lines or less than N vertical lines. Let X >2 = \ and X<m-i = 

V \ Xpf. They are SPC subsets of V. Let = T|x >2 and T~ = TJx<jv-i- 
From Lemma 3, they are ASP hypergraphs, and by induction hypothesis, have 
the simplex property. Thus, X >2 H A<x-i = ^ \ {X\ U Ax) is double-covered. 
Similarly, we can see that P \ (Fi U Ym) is double-covered (note that this set 
is 0 if M = 2). Since union of two double-covered sets is also double-covered, 
S' = [P \ {Xi U Ax)] U [P \ (Pi U Pm)] is double-covered. Thus, we can apply 
Theorem 1, and consider the restriction of P to P \ S. Any point in P \ S must 
be at a corner of the minimum enclosing isothetic rectangle of P, thus V\S has 
at most four points, for which we can directly show the simplex property of the 
restriction of P. Thus, P has the simplex property. □ 

We remark that TZ is smaller than the range space corresponding to all iso- 
thetic rectangles. However, since TZ has the simplex property, the range space of 
all isothetic rectangles also has the simplex property because of Lemma 1. Simi- 
larly, since P has the simplex property, the range space of all isothetic right-angle 
triangles has the simplex property. 

In digital halftoning application, it is important to consider the case where V 
is the set of points of an M x A grid and the hyperedge is a set of rectangles. Let 
Vij be the point at the (i, j) position. Given two points v = Vs,t and w = Vk,e such 
that s < k, let R{v, w) be the set of points in the rectangle which has v and w as 
corners. Unfortunately, if we consider the range space TZ on the set of grid points, 
TZ is not an ASP hypergraph, since R{vij,Vk^t) = R{vi/,Vkj) and the one-to- 
one property does not hold. Indeed, this hypergraph does not have the simplex 
property (we have a counterexample). However, if we give a slight modification, 
we can apply our theory. The chipped rectangle R{v, w) is obtained by removing 
the upper corner point that is neither v nor re if u and w are neither on the same 
row nor on the same column. We define R{v, w) = R{v, w) if v and w are either on 
the same row or on the same column. We define CTZ = {V, {R{v, ru)]w, w G V}). 

Theorem 7. CTZ has the simplex property. 

4.2 Algorithms for Computing Roundings 

We can design a polynomial-time algorithm for enumerating all the global round- 
ings of an input real assignment a for each of B, TZ, S, T, and CTZ. We briefly 
explain the algorithm for R. Basically, we can apply a building-up (or divide- 
and-conquer) strategy, in which we first compute the restrictions on A>|-„/ 2 ] 
and A<|-„/ 2 ]-i recursively, and check the rounding condition for R on each pos- 
sible concatenated rounding. It takes 0{v?) time for testing each concatenated 
rounding by using an efficient range-searching method, and hence the total time 
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complexity becomes This is highly contrasted to the fact that it is NP 

hard to decide the existence of a global rounding for the family of all 2 x 2 square 
regions in a grid [3,4]. 

If we consider CTZ, the linear discrepancy is known to be 0(log^ n) and 
l7(logn) [6] , and hence it is expected that a given input may have no global 
rounding. Thus, we may consider a heuristic algorithm for computing a nice (not 
necessarily global) rounding by using the building-up strategy in which we se- 
lect K best roundings (with respect to the discrepancy) from those obtained by 
concatenating pairs of assignments constructed in the previous stage to proceed 
to the next stage. Our theorem implies that if we set K > n + 1, we never miss 
a global rounding if it exists. 



5 Concluding Remarks 

If we can replace 3-parallel decomposition with 2-parallel decomposition in The- 
orem 3, we can prove the conjecture, since any 2-connected graph is decomposed 
into e and G \ {e} at the endpoints of any edge e. For a special input where 
each entry of a is 0.5 -I- e, it has been shown that there are at most m -I- 1 global 
roundings for unit edge-length connected graphs (except trees) with m edges [9] . 
However, it is not known whether /i("H(G)) is polynomially bounded in general. 
Another interesting question is whether there is a hypergraph with the simplex 
property with less than n{n+ l)/2 hyperedges (including singletons). 
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Abstract. Algorithms are considered for the external connected-com- 
ponents problem. The main contribution is an algorithm which for a 
graph with n nodes and m edges has an expected running time bounded 
by 0{m ■ log log n) when randomizing the node indices. A blocked version 
of this algorithm, which is perfectly suited for external application, 
handles bundles of W nodes at a time. For random graphs, the running 
time of this algorithm is bounded by 0(log log(n^/(m • Wj) ■ m). A 
special case of the algorithm solves the list-ranking and tree-rooting 
problem. The running time of this algorithm is linear in the number of 
involved nodes, independently of their arrangement. 

Keywords: External Algorithms, Graph Problems, Connected Compo- 
nents, List Ranking, Tree Rooting. 



1 Introduction 

Connected components, CC, is the problem of determining for an undirected 
graph G = {V, E), a function c : V >->■ {0, 1, . . . }, so that c(u) = c(v) iff u and v 
are connected by a path in G. In the list ranking problem the input is a set of 
linked lists. The task is to determine for each node the index of the last node of 
its list and the distance thereto. In tree rooting the input is a set of trees (forest) 
and the task is to determine for each node the index of the root of its tree. 

An area of increasing interest is that of computing on data sets whose size 
N exceeds the size of the main memory M. This kind of computing is called 
external computing, and algorithms designed to solve problems involving such 
large data sets are called external algorithms. See [12] for an overview with 
many references to work done in this field. A general introduction to the topic 
can be found in [9]. The central issue of external computing is that accessing 
the secondary memory takes orders of magnitude longer than performing an 
internal operation. In order to somewhat amortize this delay, for each access a 
large block of B consecutive words is moved from secondary memory into the 
main memory or vice versa. Chiang et al. have shown how to solve list ranking 
with 0{n- \\ogf^ /g{n/My\/B) I/Os, i.e., so that in total 0{n- \\ogj^ / g{n/ My\ / B) 
blocks of B words each are read and written. Tree rooting can be solved along 
the same lines. Connected components has been solved by several authors [4,6, 
1] with 0((n -I- to ) • \\og / g{n / MY\ / B) I/Os. 

The main topic of this paper is a detailed analysis of the external connected 
components problem. In the full version of this paper it is shown how the number 
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of I/Os can be bounded to just (8 • \\ogf^ig{2 ■ n/M)] • {m + o{m)) + 2 ■ m + 
0(n)) I B. For large m/n this is fine, but for sparse graphs one should also bound 
the 0{n) term. In order to achieve this we present a novel way to reduce the 
number of nodes. This algorithm is so efficient that in most practical cases it 
can be used as a stand-alone routine. Without randomization, the performance 
of this algorithm may be bad, but when randomizing the node indices, its time 
consumption is bounded by 0(m Tog log n). A blocked version of this algorithm, 
which is perfectly suited for external application, handles bundles of W nodes 
at a time. For random graphs, the running time of this algorithm is bounded by 
0(loglog(n^/(m • W)) ■ m). In an external application we can take W = M/2. 
Experiments with various classes of graphs show that only for extreme values of 
n the number of I/Os may exceed (14 ■ m+11- n)/B ■ [log^/g (n/M)] . Due to 
a lack of space many proofs are omited. 

Most of the efficient connected components algorithms, such as [10,3], use 
some node-reduction based on the idea in [7] down at the bottom. Instead of 
this, it would be better to apply our new algorithm, which is several times faster, 
as a substitute. Another strong feature of our new algorithm is that it requires 
external storage for m edges, whereas all alternatives require storage for 2 • m 
edges. This is a great advantage because for very large problems the capacity of 
the secondary memory is a limiting factor. This node-reduction algorithm can 
easily be modified to solve list ranking and tree rooting in a simple and efficient 
way. It is four times faster than previous approaches. 



2 Preliminaries 

We consider sequential and external algorithms. Here and in the following by 
a sequential algorithm we mean an algorithm whose performance is analyzed 
under the unit-cost assumption of the von Neumann model. For the external 
algorithms, we consider the number of I/Os: the number of blocks of size B that 
are read from and written to the external memory. The size of the main memory 
is denoted by M and the problem size by A^. If < M^/(4 • B) the input can 
be divided in chunks of size M/2, and at any time the main memory can hold 
one such a chunk plus a block of size B for every other chunk. These additional 
blocks can be used as buffers for updates that are made to the data in the other 
chunks. The above condition, satisfied for most practical values of N, M and B, 
allows to focus on the essentials. Therefore, the algorithms are first formulated 
under an assumption of this type, while the general case is treated as extension. 

We consider graphs G = (V,E), where E is a set of n nodes and E a set of 
m edges. The ratio g = m/n will be called the density of the graph. A graph is 
called (un)directed if the edges are (un)directed. Two nodes u and v are adjacent, 
if there is an edge (u,v). u and v are in the same component, if there is a path 
in G from u to v. We assume that the input for the graph is given as a file with 
2 • m integers which are interpreted as m pairs of nodes. 

A problem on a graph with n nodes and m edges is called semi-external if the 
information related to the nodes fits in the main memory, but the information 
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related to the edges does not. In the case of connected components, the semi- 
external problem is much easier than the external problem: 

Lemma 1. Ifn<M, CC can be solved with2-m/ B I/Os and 0{n+m-a{m,n)) 
internal work. 

A similar result was already derived by Abello et al. in [1]. 

Previous external CC algorithms simulated the PRAM algorithm of [5] that 
reduces the number of nodes by a factor of at least two in every iteration. How- 
ever, possibly the number of edges does not diminish, causing an extra logarith- 
mic factor. Several authors [4,6,1] used a randomized selection procedure based 
on work by Karger et al. [8] to reduce the number of edges, achieving asymptotic 
optimality. In the full version of this paper it is shown that for a refinement of 
this strategy the following holds: 

Theorem 1. External CC can he solved with an expected number of (8 • 
[log^/g(2 • n/M)] • (m -I- o{m)) + 2 ■ m + 0{n))/B I/Os. 

3 Faster Node Reduction 

The value of the 0{n) term in the time consumption of the CC algorithm is of 
crucial importance for sparser graphs. Therefore, we propose an efficient proce- 
dure for reducing the number of nodes. It has some similarity with Hirschberg’s 
algorithm, but the specific order in which the nodes are processed makes it far 
more efficient. The presented method also has some similarity with the ‘func- 
tional’ algorithm in [1], where the input is repeatedly split in two equal parts. 
We first describe a simple and efficient sequential reduction algorithm, then we 
show how an external version can be derived. 

3.1 Sequential Algorithm 

As input we assume a file with the numbers n and m followed by m pairs (i,j), 
^ Cl i, j < n, indicating an undirected edge from i to j. The algorithm is correct 
for all inputs, but in the analysis we will assume that the indices are randomized. 
Initially all edges are scanned and added to an adjacency list: (m, u) is added to 
the adjacency list of m if u > u, else it is added to the list of v. The constructed 
graph representation we call a one-sided adjacency list. After these preparative 
steps, two passes through the nodes are made: one starting with node n — 1 
and ending with node 0, the other running in the other direction. The complete 
algorithm, in which for each node u the minimum of the indices of the nodes 
within its component is computed in min{u), has the following extremely simple 
structure: 



Proc FAST.CONCOMPS 
1. for all (u,u) do 

if u < M then add v to the list of m; 
else if M < u then add u to the list of u; 
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2. for It = n — 1 to 0 do 

if the list of u is empty then min{u) = u; 
else min{u) = smallest entry in the list of u; 
for all V ^ min{u) in the list of u do 
insert min{u) into the list of v; 

3. for rt = 1 to n — 1 do 

min{u) = min{min{u)). 

During the right-to-left pass of step 2, for each node u its list is inspected and 
min{u) inserted into the lists of all other nodes occurring in its list. This is 
equivalent to contracting the edge {u,min{u)) in the one-sided adjacency-list 
representation. Such contractions are illustrated in Figure 1. A current node u 
whose list did not receive any entries v < u has minimum index within its 
component: min{u) has reached its final value. The other nodes update their 
min-value one more time during the left-to-right pass of step 3. 







Fig. 1. Repeatedly reducing the size of a graph by removing the node u with maximum 
index by contracting the edge (m, min{u)), where min(u) gives the minimum node index 
of any of the neighbors of u. 



Invariant 1. At any time, the list of a node u only contains nodes v < u. 

Proof: The invariant is not violated during step 1. During step 2, min{u) < v 
is inserted in the list of v. □ 



Invariant 2. During step 2, after the iteration for a given u, for any v,w < u, 
V and w are connected via a path over edges exclusively lying in the lists of nodes 
X < u iff u and w are connected by a path in the initial graph. 
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Proof: Assume by induction that the invariant holds after Iteration m+ 1. Con- 
sider a path from v to w. As long as all nodes on this path are smaller than u, 
everything is fine. If u lies on the path then there is a sub-path v',u,iv' of two 
edges on which u is the largest node. According to invariant 1, we can assume 
that v' and w' are stored in the list of u. So, min{u) is added to the lists of u' 
and v' (the cases min{u) = v' or min{u) = w' are slightly different). Thus, the 
sub-path v',u,w' is replaced by v',min{u),w' and v and w stay connected. On 
the other hand, no unrelated nodes are connected: all nodes that are connected 
afterwards, were connected through u before. □ 



Lemma 2. fast_CONCOMPS correctly solves the CC problem. 

Proof: If during step 2 we encounter a node u with empty list, u must constitute 
the smallest index of its component. Assume it does not. Then, by invariant 2, 
there is a path to a smaller node using only nodes x < u. So, there must exist an 
edge from u to v < u and according to invariant 1 it has to appear in the list of 
u. This is a contradiction. The correctness of the assigned component numbers 
is easily established by induction. Either node u is the smallest node of its 
component, and already has the correct min- value after step 2, or min(u) < u, 
and we may inductively assume that node min{u) already has computed its 
correct component number. □ 



3.2 Time Complexity 

Step 1 takes 0{m) time and Step 3 0(n). Thus the running time of 
FAST.CONCOMPS is dominated by the time for Step 2, which is proportional 
to number of performed insertions. The problem is that insertions may trig- 
ger further insertions. In the example of Figure 1, adding the edge (4,2) when 
node 11 is eliminated, leads to adding the edge (2, 0) when node 4 is eliminated. 
We have constructed an input graph with n nodes and 0{n) edges for which 
the algorithm makes insertions. This problem is strongly reduced using 

randomization. 

We first study a slightly different algorithm which is easier to analyze: it is 
identical to fast_CONCOMPS, except that in Step 2 it selects the edge to contract 
independently at random from the adjacency list of u. 

Theorem 2. If the indices of the nodes are randomized, the expected number of 
operations for the modification of fast.CONCOMPS on graphs with n nodes and 
m edges is bounded by 0{n + m ■ logn). 



Corollary 1. The modification o/fast_CONCOMPS reduces the number of nodes 
from n to n' processing an expected number of less than X)"=n' 2 • mf{u -|- 1) ~ 
2 ■ m ■ (In n — In n') edges. 
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Actually, the performance is much better. This is due to the fact that an 
entry representing an edge (u,v) is reinserted in row min{u). Thus, the entries 
tend to clutter in the rows with the smallest indices, an effect that is rapidly self- 
reinforcing. For the analysis, it is crucial to fully understand the way reinserted 
entries may lead to new reinsertions in turn. If, when processing node u, min(u) 
is added to the adjacency list of node v because v occurred in the adjacency list 
of u, then there are three cases to distinguish: 

1. There was already an entry min(u) in the adjacency list of v. 

2. By the time node v is processed min{v) = min{u). 

3. By the time node v is processed min{v) < min{u). 

In the first case, min{u) is either not entered at all, or removed when processing 
node V, and thus it does not lead to future insertions. The same is true for the 
second case: when processing node v, min(u) is the node with the smallest index 
in the adjacency list of v and thus the edge (u, min{u)) is contracted and not 
reinserted. Only in the third case, there is a reinsertion which can be traced back 
to the entry v in the adjacency list of u. This entry is made in the adjacency list 
of min{u). This we call a secondary reinsertion. We summarize: 

Lemma 3. If an entry v in the adjacency list of node u leads to a secondary 
reinsertion, this is performed in the adjacency list of node min{u). 

This is good, because min{u) tends to be a small number, and thus the 
number of nodes in the graph will have been strongly reduced before a secondary 
reinsertion is processed again. 

Theorem 3. If the indices of the nodes are randomized, the expected number of 
operations of fast_CONCOMPS on graphs with n nodes and m edges is hounded 
by 0{n + m ■ log log n) . 

3.3 Version with Bundles 

In the further development of our algorithm, it is convenient to do as if we are us- 
ing an adjacency-matrix representation. Doing this, the entries of the adjacency 
list of node u, appear in column u. Because of Invariant 1, the lower-triangular 
positions are all empty at all times. The operation of the sequential algorithm 
consists of copying the entries from column u to row min{u). 

In preparation of an external algorithm, we present a version of the algorithm 
working with bundles consisting of W nodes with consecutive indices each. Bun- 
dle i,Q<i< n/W, consists of all nodes with indices u, i-W <u<{i+l)-W. 
This gives the subdivision of the adjacency matrix depicted in Figure 2. In the 
bundles we distinguish the entries in the upper part from those in the lower part. 
The entries in the triangular lower part induce a subgraph. The idea is to first 
solve this subproblem and to link the lists of all nodes in the same component 
together. This gives longer lists, which on average will contain smaller minimum 
values. This implies that the secondary reinsertions tend to happen in columns 
which lie even further to the left. 

More precisely. Step 2 of fast_CONCOMPS is rewritten as follows: 
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^ w ► ^ w w ► ^ w 




Fig. 2. Subdivision of the adjacency matrix in bundles. Position (0, 0) lies in the upper- 
left corner. 



2. for i = n/W — 1 to 0 do 

PROCESS_BUNDLE(i); 

Here we are repeatedly calling the following subroutine: 

Proc PROCESS_BUNDLE 

1. Solve the CC problem for the subgraph induced by the nodes belonging 
to bundle i. For each node u, min{u) denotes the minimum index of all nodes 
in its component within this subgraph. 

2. For each index u with min{u) = m do the following: 

a. Compose the set being the union over all nodes belonging to the 
same component as u of all entries in the upper parts of the columns of the 
adjacency matrix. 

b. For all j < i set Vj = oo. For each v S determine the bundle j to 
which V belongs and set Vj = min{uj,u}. 

c. min{u) = min{mm(u), {vj\j < f}}. 

d. For each v G Su, determine the bundle j to which v belongs. If n yf Vj, 
then add an entry Vj to the column of v; else, if u yf min(u), then add an 
entry min(u) to the column of v if this entry has not yet been added before. 

The relinking is illustrated in Figure 3. Using 0{n/W) additional storage, the 
above algorithm runs in 0(njW + W + rrii) time, where rrii gives the number of 
entries in bundle i. Choosing W > the additional costs become negligible. 

In general there does not need to be much structure in the induced subgraphs. 
For example, if the graph consists of many small connected components, then it 
may take until the last few bundles before the structure of the components gets 
revealed. The situation is better for random graphs. The reason is that a random 




External Connected Components 



475 



Before Relinking 




After Reiinking 



bundle 0 


bundle 1 


bundle 2 


bundle 3 








O O O O 



Fig. 3. When processing a connected component of bundle 3, new links are added to the 
other bundles so that the connectivity is preserved, while at the same time minimizing 
the number of links across the bundle boundaries. 



graph from Gn,m has, with high probability, a giant connected component con- 
sisting of n • (1 — 0(e“^'®)) nodes. So, for g > logn, we may assume that all but 
constantly many nodes belong to the same component. Because the algorithm 
can be viewed to compress the edges in a smaller set of nodes, eventually g will 
be sufficiently high to ensure a giant component in the induced subgraph. Let 
u be the node with smallest index in this component. It is very likely that 
contains a node in bundle 0. But this implies that any secondary reinsertion will 
be performed in bundle 0, so once this happens edges are reinserted at most two 
more times. 

Theorem 4. For a graph from Qn,m, the expected number of operations of the 
blocked version o/fast_CONCOMPS is bounded by 0{n + m-\og\og(n^ /{m-W))). 

3.4 External Algorithm 

The above blocked version of the CC algorithm can easily be turned into an 
efficient external algorithm. There are two possibilities: 

— Choose W so that all entries in a bundle fit into the main memory. A disad- 
vantage is that gradually the density increases and that W cannot be taken 
constant over the whole range. Nevertheless, with minor corrections, this 
idea works fine when taking W so that c ■ g ■ W < M, for some suitable 
constant c, leaving some slack and sufficient space for file buffers. 

— Choose W = M/2 and run the semi-external algorithm to solve the problems 
on the induced subgraphs. The remaining space in the main memory is used 
for file buffers. The most convenient is to have two files per bundle: one for 
the entries in the upper and one for the entries in the lower part. 

Applying the first idea, the above blocked version of fast_CONCOMPS can be 
used with very little changes. In step I all edges are written to files corresponding 
to the bundles. During step 2, for each bundle, the file with the edges is read. The 
internal problem is solved and updates to bundles with lower indices are added 
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to their files. Hereafter the preliminary mm-values are written. The questions 
that have to be answered during step 3 by nodes from other bundles are written 
to a single file. At the end of step 2, these questions are handed out over files 
corresponding to the bundles. During step 3, for each bundle the mm-values 
are read and the received answers are used to compute the final mm-values. 
Thereafter the questions to the nodes of this bundle can be answered. Finally 
the mm-values are written. 

Because finally the whole graph is contracted on the nodes with the smallest 
indices, and because we may expect particularly many reinsertions for the nodes 
with the very smallest indices, a small fraction of the internal memory is used 
for maintaining the adjacency matrix for these nodes. Using one bit per entry, it 
is reasonable to do this for 2^^ nodes. This saves quite some I/O, because these 
entries might otherwise have been written many times. 

During the initialization, all edges of the graph are read and written once. 
During the right-to-left pass through the data, the added edges are written, and 
all edges are read once. Thus, if the total number of entries is / • m, then the 
number of I/Os during the initialization is 4 • m/ B, and during the right-to-left 
pass it is {4- f — 2) -m/ B. The number of questions q-n to nodes in other bundles 
is of a much smaller order. At worst, there is one question for every node. This 
induces at most 8 ■ q ■ n/B I/Os, because all questions are written and read at 
most once, result in an answer and contain the destination and the sender. In 
addition, the min-values are read once and written twice: 

I /Ostotai = ((4 ■ f + 2) ■ m + (8 ■ q + 8) ■ n)j B. (I) 

Applying the second idea has the advantage that there is no need to estimate 
the number of edges belonging to a bundle. Most of the algorithm remains the 
same, there are only some changes in step 2. If there are separate files for the 
upper and lower part of the bundles, then the file containing the edges belonging 
to the induced subgraph must be traversed once for solving the semi-external 
CC problem. Hereafter the file with the edges to relink must be traversed twice. 
There is no need to construct the sets explicitly. The whole algorithm can be 
implemented with one integer per node. 

The I/O of this algorithm is the same as before, except that the edges in the 
upper part of the bundles are read twice and not once. So, we get the following 
analogue of (1): 



//Ostotai = ((6 • /' + 2) • m + (8 • (?' + 3) • n)/B. (2) 

Due to the larger value of W we will have f < f and q' < q. For moderate 
n this will have a considerable impact, and this second variant will be better. 
For n ^ M, the first variant will be better. The best is to apply a combination, 
switching from the first to the second strategy as soon as we may hope to discover 
some structure in the induced subgraphs. If 4-n/M ■ B > M/2, something which 
will never happen in practice, we must work with super-bundles the entries of 
which are redistributed. This multiplies the I/O by a factor \log]^/g{n/M)']. 
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Theorem 5. For a graph from Gn,m, the expected number of I/Os is bounded 
by 0((n + m- log log(n2/(m-M))/_B- \logM/B(n/M)']). 

Not considering the last factor, we see that the number of I/Os is linear in n + m 
if n/M = 0{m/n). 

3.5 Experiments 

The sequential and the first variant of the external version of fast.CONCOMPS 
have been implemented in C. External memory is managed with files, as this is 
far more efficient than using virtual memory. The size of the file buffers was set 
to 128 KB. The program was run on a Pentium IV PC with a 2.66 GHz clock, 2 
GB of internal memory and a free 120 GB partition on a conventional 7200-rpm 
harddisk. The operating system was Red Hat Linux 8.2 and the program was 
compiled with gcc —03. However, rather than on the time in seconds we focus 
on the fundamental parameters / and q which determine the number of I/Os as 
expressed in (1). 

For random graphs the experiments show that for all but a small fraction 
of the bundles at the end, the number of entries in a bundle lies very close to 
2 ■ W ■ mjn. Depending on the number of bundles b = n/W and the density 
g = m/n, there was a more or less pronounced increase of the number of entries 
in the last few bundles. Resulting values for / as a function of b and g are given 
in Table 1. 



Table 1. Experimental results for the value of / for random graphs. The bundle- width 
W = in all cases. 



b 


30 


60 


120 


240 


480 


960 


/(M) 


1.98 


2.07 


2.13 


2.19 


2.24 


2.29 


fib, 2) 


2.27 


2.45 


2.58 


2.69 


2.79 


2.87 


fib, A) 


2.07 


2.28 


2.54 


2.72 


2.83 


2.93 


fib, 8) 


1.84 


2.00 


2.14 


2.30 


2.53 


2.65 


fib, 16) 


1.63 


1.81 


1.95 


2.05 


2.13 


2.23 



Looking at the values Table 1 in a qualitative way, we see that f{b,g) in- 
creases with b. This is not surprising, as there is no reason to believe that /(&, g) 
is bounded by a constant as a function of b. Furthermore, with increasing b, the 
positive effect of handling a certain fraction of the nodes within a bundle at the 
same time becomes smaller and smaller. More interesting is the dependence on 
g: the largest /-values are found for g = A. The reason is that for smaller g, 
the effect that one edge disappears for every processed node becomes notice- 
able. For larger g, the minimum value in an adjacency list tends to be smaller, 
thereby leading to shorter sequences of reinsertions for an entry. For larger g 
it is also important that by the time the problem gets hard, in the last frac- 
tion of bundles to process, considerable connected components show up in the 
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induced subgraphs. Because of these effects, the number of significant values in 
Table 1 is too small to determine the constants in the presumed development 
f{b,g) = a + (3 ■ loglog{b/g). The only thing we can say is that the results do 
not contradict such a development. 

We consider the result for the largest graph in more detail: processing the 
random graph with n = 960 • and m = 16 ■ n took 72526 seconds. This time 
is subdivided as follows: 12735 seconds for generating the edges and distributing 
them over the buckets; 58274 seconds for step 2 of fast.CONCOMPS without han- 
dling bundle 0; 699 seconds for handling bundle 0 and reallocating the questions; 
818 seconds for step 3. Substituting / = 2.23, q = 0.38 and the values for n and 
m in (1), we find that in total 694 GB are read and written, on average 9.6 MB 
per second. This is about 60% of the maximum transfer rate of this harddisk. 
The remaining time is used for internal computation. The computation required 
at most 120 GB of secondary memory and 828 MB of main memory. 

We also tested the performance of fast.CONCOMPS for several other classes 
of graphs which differ considerably in their number of components, diameter 
and structure. These were star graphs, some kind of interval graphs, partial tori 
and trees. For all of these classes the results (presented in the full version) were 
better than those for random graphs. 

Although FAST.CONCOMPS was designed as a node reduction subroutine, it 
has impressive stand-alone performance. Extrapolating our experiments and con- 
sidering the results for other graph classes, it is safe to say that for all practically 
relevant graphs / < 3. Substituting in (1) then gives 

//Osfast_concomps ^ (14-m-|- 11 ■ n)/B. 

Alternatively, we can use fast.CONCOMPS in a recursive algorithm together 
with the (10 • TO -I- o(to) -I- 0{n)) / B solution of Theorem 1: if the current density 
of the graph g > c, for some constant c, then we apply edge reduction, else node 
reduction. Taking c = 16 and performing some optimizations, Gorollary 1 gives 
us an upper bound 



d/Ocombined < (10 • TO -|- 16 • i/to -|- 20 • Tl) / B . (3) 

Glearly, both approaches constitute a tremendous improvement over the al- 
gorithms sketched in [1] and [6] which after several optimizations still need at 
least (40 • to -I- 0(n))/i? I/Os. 

4 List Ranking and Tree Rooting 

List ranking can be viewed as a weighted and directed form of GG where all 
connected components happen to be lists. We sketch how to modify the GG 
algorithm from Section 3. The entries in the node lists are complemented by 
signed distance values, the sign denoting the edge direction (positive: outgoing, 
negative: incoming). For example, (w, — 3) in the list of u stands for a path of 
length 3 from w to u. If during step 2 node u is processed and has list entries 
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(v,lv) and (w,ly), v < w < u, then a new entry (w,/^ — Iw) is added to the list 
of node w. An external version of the above algorithm is constructed along the 
same lines as for external connected components. 

Theorem 6. If n < M^/(32 • B), then ranking a set of lists of total length n 
presented as a set of edges can he performed deterministically with 22-n/B I/Os 
for all inputs. 

Previously, external list-ranking was solved via simulation of a parallel algorithm 
that uses independent set removal in order to reduce the number of nodes by 
a constant factor in each phase [2]. Probably the best implementation is given 
in [11]. It can be estimated that our new approach, which is also simpler, is at 
least four times faster. Tree rooting can be solved with the same algorithm. The 
algorithm does not even need to know that the graph is a tree. 

Acknowledgement. Ulrich Meyer was involved in an earlier version of this 
paper. 
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Abstract. We present improved cache-oblivious data structures and 
algorithms for breadth-first search and the single-source shortest path 
problem on undirected graphs with non-negative edge weights. Our re- 
sults removes the performance gap between the currently best cache- 
aware algorithms for these problems and their cache- oblivious counter- 
parts. Our shortest-path algorithm relies on a new data structure, called 
bucket heap, which is the first cache-oblivious priority queue to efficiently 
support a weak DecreaseKey operation. 



1 Introduction 

Breadth-first search (BFS) and the single-source shortest path (SSSP) problem 
are fundamental combinatorial optimization problems with numerous applica- 
tions. SSSP is defined as follows: Let G = (V,E) be a graph with V vertices 
and E edges, ^ let s be a distinguished vertex of G, and let u be an assignment 
of non-negative real weights to the edges of G. The weight of a path is the sum 
of the weights of its edges. We want to find for every vertex v that is reachable 
from s, the weight dist(s, v) of a minimum-weight (“shortest”) path from s to v. 
BFS can be seen as the unweighted version of SSSP. 

Both problems are well understood in the RAM model, where the cost of a 
memory access is assumed to be independent of the accessed memory location. 
However, modern computers contain a hierarchy of memory levels; the cost of 
a memory access depends on the currently lowest memory level that contains 
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the accessed element. This is not accounted for in the RAM model, and current 
BFS and SSSP-algorithms, when run in memory hierarchies, turn out to be 
notoriously inefficient for sparse input graphs. The purpose of this paper is to 
provide improved data structures and algorithms for BFS and SSSP under the 
currently most powerful model for multi-level memory hierarchies. 

Models for memory hierarchies. The most widely used model for the design of 
cache-aware algorithms is the I/O-model of Aggarwal and Vitter [2]. This model 
assumes a memory hierarchy consisting of two levels; the lower level has size M ; 
data is transferred between the two levels in blocks of B consecutive data items. 
The complexity of an algorithm is the number of blocks transferred (I/Os). The 
parameters M and B are assumed to be known to the algorithm. The strength 
of the I/O-model is its simplicity, while it still adequately models the situation 
when the I/Os between two levels of the memory hierarchy dominate the running 
time of the algorithm; this is often the case when the size of the data significantly 
exceeds the size of main memory. Cache-oblivious algorithms are designed to be 
I/O-efficient without knowing M or B; that is, they are formulated in the RAM 
model and analyzed in the I/O-model, assuming that the memory transfers are 
performed by an optimal offline paging algorithm. Since the analysis holds for any 
block and memory sizes, it holds for all levels of a multi-level memory hierarchy 
(see [16] for details). Thus, the cache-oblivious model elegantly combines the 
simplicity of the I/O-model with a coverage of the entire memory hierarchy. 

A comprehensive list of results for the I/O-model have been obtained — see 
[3,19,22] and the references therein. One of the fundamental facts is that, in the 
I/O-model, comparison-based sorting of N elements takes 6*(Sort(iV)) I/Os in 
the worst case, where Sort(A^) = ^ logj\^/^ 

For the cache-oblivious model, Frigo et al. developed optimal cache-oblivious 
algorithms for matrix multiplication, matrix transposition, fast Fourier trans- 
form, and sorting [16]. The cache-oblivious sorting bound matches that for the 
I/O-model: 0(Sort(A^)) I/Os. After the publication of [16], a number of results 
for the cache-oblivious model have appeared; see [13,19] for recent surveys. 

Some results in the cache-oblivious model, in particular those concerning sort- 
ing and algorithms and data structures that can be used to sort, such as priority 
queues, are proved under the assumption that M > B^. This is also known as the 
tail-cache assumption. In particular, this assumption is made in the Funnelsort 
algorithm of Frigo et al. [16]. A variant termed Lazy-Funnelsort [6] works 
under the weaker tail-cache assumption that M > for any fixed e > 0. 

Recently, it has been shown [8] that a tail-cache assumption is necessary for 
cache-oblivious comparison-based sorting algorithms. 

Previous and related work. Graph algorithms for the I/O-model have received 
considerable attention in recent years. Many efficient cache-aware algorithms 
do have a cache-oblivious counterpart that achieves the same performance; see 
Table 1. Despite these efforts, only little progress has been made on the so- 
lution of the SSSP-problem with general non-negative edge weights using ei- 
ther cache-aware or cache-oblivious algorithms: The best known lower bound is 
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Table 1. I/O-bounds for some fundamental graph problems. 



Problem 


Best cache-oblivious result 


Best cache-aware result 


List ranking 


0(Sort(Y)) [4] 


0(Sort(Y)) [11] 


Euler Tour 


0(Sort(Y)) [4] 


0(Sort(Y)) [11] 


Spanning tree/MST 


0(Sort(E) - log log U) [4] 

0(Sort(E)) (randomized) [1] 


0(Sort(£;) - loglog(UB/E)) [5] 

0(Sort(E)) (randomized) [1] 


Undirected BPS 


0(Y + Sort(E)) [21] 

0(ST(E) + Sort(B) 

+ % ■ \ogV + ^/VEJB) New 

0(ST(E) + Sort(B) 

+ i • i - log log Y 
+ sJVEjB ■ s/VrEpEf) New 


0(ST(£;) + Sort(E) + ^/VE/B) [18] 


Directed BPS & DPS 


0((V + E/B) ■ log V + Sort(E)) [4] 


0((V + E/B). log V + Sort(E)) [10] 


Undirected SSSP 


0(V + (E/B) ■ log(E/B)) New 


0(V + (E/B)-log(E/B)) [17] 



l7(Sort(£l)) I/Os, which can be obtained through a reduction from list ranking; 
but the currently best algorithm [17] performs 0(V + (E/B) log 2 (E/B)) I/Os 
on undirected graphs. For E = 0(V), this is hardly better than naively running 
Dijkstra’s internal-memory algorithm [14,15] in external memory, which would 
take 0(141og2t^ -I- E) I/Os. On dense graphs, however, the algorithm is effi- 
cient. The algorithm of [17] is not cache-oblivious, because the applied external- 
memory priority queue based on the tournament tree is not cache-oblivious. 
Cache-oblivious priority queues exist [4,7]; but none of them efficiently supports 
a DecreaseKey operation. Indeed, the tournament tree is also the only cache- 
aware priority queue that supports at least a weak form of this operation. 

For bounded edge weights, an improved external-memory SSSP-algorithm 
has been developed recently [20]. This algorithm is an extension of the cur- 
rently best external-memory BFS-algorithm [18], Fast-BFS, which performs 
0{\JVEjB E Sort(if) -I- ST(if)) I/Os, where ST{E) is the number of I/Os re- 
quired to compute a spanning tree (see Table 1). Again, the key data structure 
used in Fast-BFS is not cache-oblivious, which is why the currently best cache- 
oblivious BFS-algorithm is that of [21]. 



Our results. In Section 2, we develop the first cache-oblivious priority queue, 
called bucket heap, that supports an Update operation, which is a combined 
Insert and DecreaseKey operation. The amortized cost of operations Up- 
date, Delete, and DeleteMin is 0{{\/ B) B)) where N is the number 

of distinct elements in the priority-queue. Using the bucket heap, we obtain a 
cache-oblivious shortest-path algorithm for undirected graphs with non-negative 
edge weights that matches the performance of the best cache-aware algorithm 
for this problem: 0(V + {E / B)\og 2 {E / B)) I/Os. Independently of our work, 
the bucket heap as well as a cache-oblivious version of the tournament tree have 
simultaneously been developed by Chowdhury and Ramachandran [12]. 
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In Section 3, we develop a new cache-oblivious algorithm for undirected BFS. 
The algorithm comes in two variants: The first variant performs Cl(ST(if) -|- 
Sort(if) -I- ^logV + yJVE/B) I/Os; the second variant performs 0(ST(i?) -|- 
Sort(if) + % ■ \ ■ log log y -I- yJVE/B ■ yJVB/E'^) I/Os, for any £ > 0. Here, 
ST(i?) denotes the cost of cache-obliviously finding a spanning tree. 

2 The Bucket Heap and Undirected Shortest Paths 

In this section, we describe the bucket heap, which is a priority queue that sup- 
ports an Update (a weak DecreaseKey) operation and does so in the same 
I/O-bound as the tournament tree of [17]. Using the bucket heap, the SSSP- 
algorithm of [17] becomes cache-oblivious. Similar to the tournament tree, the 
bucket heap supports the following three operations, where we refer to an ele- 
ment X with priority p as element (x,p): 

Update(a;, p) inserts element (x,p) into the priority queue if x is not in the pri- 
ority queue; otherwise, it replaces the current element {x,p') in the priority 
queue with (a;,min(p,p')). 

Delete(a;) removes element x from the priority queue. 

DeleteMin removes and reports the minimal element in the priority queue. 

The bucket heap consists of q buckets 81,82, ■■■ ,Bq and q + 1 signal buffers 
81,82, ■■■ , 8q+i, where q varies over time, but is always at most ]"log4 N~\ . The 
capacity of bucket 8i is 2^*; buffer 8i has capacity 2^*“^. In order to allow for 
temporary overflow, we allocate 2^*+^ memory locations for bucket 8i and 2^* 
memory locations for buffer 8i. We store all buckets and buffers consecutively 
in memory, in the following order: 81, 81,82, 82, ■ ■ ■ ,8q, 8g,8q+i. 

Buckets 81,82, ■■■ ,8q store the elements currently in the priority queue. We 
maintain the invariant that for any two buckets 8i and 8j with i < j and any 
two elements (x,p) € Bi and {y,q) G 8j , p < q. 

Buffers 81,82, - ■ ■ store three types of signals, which we use to update 

the bucket contents in a lazy manner: Update(cc,p) and Delete(x) signals 
are used to implement Update(x,p) and DELETE(a;) operations. A PuSH(x,p) 
signal is used to push elements from a bucket Bi to a bucket Bi+i when Bi 
overflows. Every signal has a time stamp corresponding to the time when the 
operation posting this signal was performed. The time stamp of an element in a 
bucket is the time stamp of the Update signal that led to its insertion. 

The three priority queue operations are implemented as follows: A Delete- 
Min operation uses the Fill operation below to make sure that bucket 81 is 
non-empty and, hence, contains the element with lowest priority. This element 
is removed and returned. An UPDATE(a;,p) or DELETE(a;) operation inserts the 
corresponding signal into 5i and empties 5i using the Empty operation below. 
Essentially, all the work to update the contents of the bucket heap is delegated 
to two auxiliary procedures: Procedure EMPTY(5i) empties the signal buffer 8i, 
applies these signals to bucket Bi, and inserts appropriate signals into buffer . 
If this leads to an overflow of buffer the procedure is applied recursively 
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to iSi+i- Procedure FiLL(,Bi) fills an underfull bucket Bi with the smallest 2^* 
elements in buckets Bi, ... ,Bq. The details of these procedures are as follows: 

EMPTY(5i) 

1 li i = q + 1, then increase q by one and create two new arrays Bq and S,+i. 

2 Scan bucket Bi to determine the maximal priority p' of the elements in Bi . 
(If i = q and = 0 , then p' = oo. Otherwise, if Bi = 0 , then p' = — oo.) 

3 Scan buckets Si and Bi simultaneously and perform the following operations 
for each signal in Sp. 

UPDATE(a:,p): If Bi contains an element {x,p”), replace this element with 
(x,min(p,p")) and mark the UPDATE(a;,p) signal in as handled. Ifxis 
not in Bi, but p < p', insert (x,p) into Bi and replace the Update(x,p) 
signal with a Delete(x) signal. If x is not in Bi and p > p' , do nothing. 
Push(x,p): If there is an element {x,p") in Bi, replace it with (x,p). Oth- 
erwise, insert (x,p) into Bi. Mark the PuSH(x,p) signal as handled. 
Delete(x): If element x is in Bi, delete it. 

4 If i < g or iSi+i is non-empty, then scan buffers Si and and insert all 

unhandled signals in Si into Si ^ 9 

5 If \Bi\ > 2^*, then find the 2^®-th smallest priority p in Bi. Scan bucket 
Bi and buffer twice: The first scan removes all elements with priority 
greater than p from Bi and inserts corresponding Push signals into 

The second scan removes \Bi\ — 2^* elements with priority p from Bi and 
inserts corresponding Push signals into 5i+i. 

6 If |5,+i| > 22*+i, then EMPTY(5i+i) 

FlLL(Bj) 

1 EMPTY(iSi) 

2 If \Bi+i\ < 2^* and i < q, then FiLu(,Bi+i) 

3 Find the (2^* — |,Bi|)-th smallest priority p in Bi+\. Scan Bi and Bij^i twice. 
The first scan moves all elements with priority less than p from to Bi. 
The second scan moves the correct number of elements with priority p from 
Bi+i to Bi so that Bi contains 2^* elements or Bij^i is empty at the end. 

4: q -p- max{j : Bj or 5j+i is non-empty} 

2.1 Correctness 

A priority queue is correct if, given a sequence 0 i, 02 ,..., 0 ( of priority queue 
operations, every DeleteMin operation Oi returns the smallest element in the 
set Oi-i constructed by operations oi, . . . , Oi-i according to their definitions at 
the beginning of this section. In the following we use the term “element at level 
j” to refer to an element in bucket Bj or to an Update signal in buffer Sj. We 
say that a DEUETE(a;) signal in a buffer Si hides an element {x,p) at level j if 
i < j and the time stamp of the DELETE(a;) signal is greater than the time stamp 
of element (x,p). An element {x,p') hides an element {x,p) if it is not hidden by 
a DELETE(a:) signal and p' < p. An element that is not hidden is visible. 
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Observe that a hidden element can never be returned by a DeleteMin oper- 
ation. Hence, it suffices to show that the set Vi of elements that are visible after 
applying operations oi, 02, . . . , Oi equals the set Oi and that every DeleteMin 
operation returns the minimal element in the current set of visible elements. 

Lemma 1. A DeleteMin operation Oi returns the minimal element in Vi -\. 

Proof. We have already observed that the returned element {y,q) is in Vi_i. 
If (x,p) is the smallest visible element and q > p, we distinguish a number of 
cases: Let i be the current level of element (x,p), and let j be the current level of 
element (y, q). If i < j, then element (x,p) will move to lower buckets before or 
at the same time as {y,q). Hence, (x,p) would have to be returned before (y,q), 
a contradiction. If j < i, we observe that element (y, q) can only be represented 
by an UPDATE(y, g) signal in Sj because neither bucket Bi nor buffer Si can 
contain elements with priorities less than those of the elements in Bj . Hence, 
before (y, q) can be moved to a bucket and ultimately returned, we must reach 
the case i < j, in which case we would return (x,p) first, as argued above. □ 



Lemma 2. For any sequence oi, 02 , ■ ■ ■ , Ot and any 1 <i <t, Vi = Oi. 

Proof. The proof is by induction on i. For i = 0, the claim holds because Oi and 
Vi are empty. So assume that z > 0 and that Vi_i = Oi-i. If Oi is a DeleteMin 
operation, it removes and returns the minimum element (x,p) in Vi_i, so that 
Vi = Vi_i \ {(x,p)} = Oi-i \ {(x,p)} = Oi. If Oi is a DELETE(a:) operation, its 
insertion into 5i hides all copies of x in the priority queue. Hence, Vi = Vi-i \ 
{x} = Oi-i \ {x} = Oi. If Oi is an Update(x,p) operation, we distinguish three 
cases: If x is not in Oi-i, there is no element (x,p') that hides the Update(x,p) 
signal. Hence, Vi = Vi-i U {(x,p)} = Oi-i U{(x,p)} = Oi. If there is an element 
{x,p') in Oi-i and p' < p, element {x,p') hides the Update(x,p) signal, and 
Vi = Vi-i = Oi-i = Oi. lip' > p, the Update(x,p) signal hides element (x,p'), 
and Vi = {V^-l \ {{x,p')}) U {(x,p)} = {0^-l \ {(x,p')}) U {(a^.p)} = Oi. □ 



2.2 Analysis 

We assume that every element has a unique ID drawn from a total order and 
keep the elements in each bucket or buffer sorted by their IDs. This invariant 
allows us to perform updates by scanning buckets and buffers as in the descrip- 
tion of procedures Empty and Fill. The amortized cost per scanned element 
is hence 0(1/ B). In our analysis of the amortized complexity of the priority 
queue operations, we assume that M = large enough to hold the first 

log 4 B buckets and buffers plus one cache block per stream that we scan. Under 
this assumption, operations Update, Delete, and DeleteMin, excluding the 
calls to Empty and Fill, do not cause any I/Os because bucket Bi and buffer 
iSi are always in cache. We have to charge the I/Os incurred by Empty and 
Fill operations to Update and Delete operations in such a manner that no 
operation is charged for more than 0{{\ / B)\og 2 {N / B)) I/Os. To achieve this. 
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we define the following potential function, where U, D, and P are the numbers 
of Update, Delete, and Push signals in buffers 61 , 62 , ■ • ■ , 6 q: 

q 

<!>= (3C/ + D + 2P)(log4(iV/B) + 3)+ ^ ((|S,| - |^i|) • (i - log4 B) + 2^*) 

2 = log4 B 

Since the actual cost of an Update or Delete operation is 0 and each of them 
increases the potential by 0 {log 2 {N/ B)), the amortized cost of these operations 
is log 2 (N / B)) if we can show that all I/Os incurred by Fill and Empty 

operations can be paid by a sufficient potential decrease, where a potential de- 
crease of n{B) is necessary to pay for a single I/O. 

We distinguish two types of Empty operations: A regular EMPTY(5i) oper- 
ation is one with |5i I > 2^* 1 If < 2^* ^]40 operation is early. The latter 

type of EMPTY(iSi) operation may by triggered by a FiLL(yBi) operation. 

First consider a regular EMPTY(5i) operation, where z < g. If i < log 4 B, 
the operation causes no I/Os — because 6t, and Bi are in cache — and the 
potential does not increase. If z > log 4 i?, the cost is bounded by 0(2^*“^/B) 
because only buffers 6i and and bucket Bi are scanned. Let k be the increase 
of the size of bucket Bi, let u, p, and d be the number of Update, Push, and 
Delete operations in 6i, and let u', p' , and d' be the number of such operations 
inserted into 5i+i. Then we have u + d + p = |5i| > 2^*“^. From the description 
of the Empty operation, we obtain that u + d = u' + d' and k + u' +p' < u + p. 
The change of potential is A<P < (3zt' + d' + 2p' — “iu — d — 2p)([og,^{N / B) + 
3) + {u + d + p + k){i — log 4 B) — {u' + d' + p'){i + 1 — log 4 B) . Using elementary 
transformations, this gives A<P < —2^*“^. 

If z = g -I- 1, the actual cost of a regular EMPTY(5i) operation remains the 
same, but the change of potential is Ad> < —{3u + d + 2p){log^{N/B) + 3) -I- (zz-|- 
d + p + k){i — log4 B) + 2^L Again, this can be bounded by A<1> < —2^*“^. 

Next we show that the cost of Fill and early Empty operations can be 
paid by a sufficient decrease of the potential <d>. Consider a Fill(,Bi) operation, 
and let j be the highest index such that a FiLL(,Bj) operation is triggered by 
this Fill(,Bi) operation. Then the cost of all Fill and early Empty operations 
triggered by this Fill(,Bi) operation is 0{2‘^^ /B). If there are new elements 
inserted into the buckets during the Empty operations, a similar analysis as for 
regular Empty operations shows that the resulting potential change is zero or 
negative. Hence, it suffices to show that, excluding these insertions, the potential 
decreases by i7(2^-l). We distinguish two cases, if g does not change, then the 
Fill(,B 4) operation moves at least 3 • 2^^“^ elements from Bjj^i to Bj, which 
results in the desired potential decrease. If g decreases, then q > j before the 
Fill(,Bi) operation and g decreases by at least one. This results in a potential 
decrease of at least 2^'^ > 2^^. Hence, we obtain the following result. 

Theorem 1. The bucket heap supports the operations Update, Delete, and 
DeleteMin at an amortized cost of 0{{l/ B)log 2 {N/ B)) I/Os. 

The shortest path algorithm of [17] is cache-oblivious, except for its use of a 
cache-aware priority queue: the tournament tree. Since the bucket heap is cache- 
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oblivious and achieves the same I/O-bound as the tournament tree, we obtain 
the following result by replacing the tournament tree with the bucket heap. 

Corollary 1. There exists a cache- oblivious algorithm that solves the single- 
source shortest path problem on an undirected graph G = {V, E) with non- 
negative edge weights and incurs at most 0{V -\- {E / B)\og 2 {E / B)) I/Os. 

3 Cache-Oblivious Breadth-First Search 

In this section, we develop a cache-oblivious version of the undirected BFS- 
algorithm from [18]. As in [18], the actual BFS-algorithm is the one from [21], 
which generates the BFS levels one by one, using that, in an undirected graph, 
Li+i = M{Li)\{Li\JLi-x), where M{S) is the set of nodes^ that are neighbours of 
nodes in S, and Li is the set of nodes of the i’th level of the BFS tree with root s. 
The algorithm from [21] relies only on sorting and scanning and, hence, can be 
implemented cache-obliviously; this gives the cache-oblivious 0(V -h Sort(A)) 
result from Table 1. The speed-up in [18] over [21] is due to a data structure 
which, for a query set S, returns J\f{S). The data structure does this in an I/O- 
efficient manner by exploiting that the query sets are the levels Lq, Li, L 2 , ■ ■ ■ of 
a BFS traversal. We provide a cache-obliviously version of this data structure. 



3.1 The Data Structure 

To construct the data structure, we first build a spanning tree for G and then 
construct an Euler tour T for the tree (using [4]). Next, we assign (by scanning 
and sorting) to each node v the rank in E of the first occurrence of v, and denote 
this value r{v). As T has length 2V — 1, we have r{v) G [0; 2V — 2]. 

Observation 1 ([18]). If for two nodes u and v the values r(v) and r(u) differ 
by d, then a section of the Euler tour constitute a path in G of length d connecting 
u and v; hence, d is an upper bound on the distance between their BFS levels. 

Let go < 5 i < < • • • < be an increasing sequence of ft. -I- 1 integers where 

go = 1, gh-i < 2V — 2 < gfi, and gi divides g^+i. We will later consider two 
specific sequences, namely gi = 2* and one for which g^ = 0(2^^+®) ). For each 
integer gi, we can partition the nodes into groups of at most gi nodes each by 
letting the ft’th group Vki be all nodes v for which kgi < r{v) < {k-\-l)gi. We call 
a group Vki of nodes a gi-node-group and call its set of adjacency lists M{Vki) a 
gi-edge-group. Since gi divides g^+i, the groups form a hierarchy of ft -I- 1 levels, 
with level ft containing one group with all nodes and level 0 containing 2V — 1 
groups of at most one node. 

The data structure consists of ft levels Gi, . . . , Gh, where each level stores 
a subset of the adjacency lists of the graph G. Each adjacency list Af{v) will 
appear in exactly one of the levels, unless it has been removed from the structure. 

^ As shorthand for A/”({w}) we will use Af{v). 
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Since the query sets of the BFS-algorithm are the BFS-levels Lq, Li, L2, ■ ■ ■ , each 
node V is part of a query set S exactly once. Its adjacency list Af{v) will leave 
the data structure when this happens. 

Initially, all adjacency lists are in level h. Over time, the query proce- 
dure GetEdgeLists moves the adjacency list of each node from higher num- 
bered to lower numbered levels, until the adjacency list eventually reaches level 
Gi and is removed. GetEdgeLists is a recursive procedure that takes as input 
a query set S of nodes and a level number i to query. The output consists of 
the (/i_i-edge-groups stored at level Gi for which the corresponding gi_i-node- 
group contains one or more nodes in S. The BFS-algorithm will query the data 
structure by calling GetEdgeLists(S', 1 ), which will return M{v), for all v in 
5 . 



GetEdgeLists(S', i ) 

1 S" ^ {i; G S' I M{v) is not stored in level Gi} 

2 if S' yf 0: 

3 X ^ GetEdgeLists(S', i + 1) 

4 for each i/i-edge-group g in X 

5 insert g in Gi 

6 for each gi_i-edge-group 7 in Gi containing Af{v) for some v € S 

7 remove 7 from Gi 

8 include 7 in the output set 

Next we describe how we represent a level Gi so that GetEdgeLists can 
be performed efficiently. By induction on time, it is clear that the edges stored 
in level Gi always constitute a set of g^-edge-groups from each of which zero or 
more g^-i-edge-groups have been removed. Since gi-i divides gi, the part of a 
3i-edge-group residing at level Gi is a collection of gi_i-edge-groups. We store 
the adjacency lists of level Gi in an array Bi. Each gi_i-edge-group is stored in 
consecutive locations of Bi, and the adjacency lists J\f{v) of a (/j_i-edge-group 
occupy these locations in order of increasing ranks r{v). The (ii_i-edge-groups 
of each i/i-edge-group are also stored in order of increasing ranks of the nodes 
involved, but empty locations may exist between the (/i_i-edge-groups. However, 
the entire array Bi will have a number of locations which is at most a constant 
times the number of edges it contains. This will require Bi to shrink and grow 
appropriately. The arrays Bi, B2, . . . , B^ will be laid out in 0 (E) consecutive 
memory locations. Due to space restrictions, we refer to the full version [ 9 ] of the 
paper for a description of how to maintain this layout at sufficiently low cost. 

In order to keep track of the g^-edge-groups within Bi we maintain an in- 
dex Ai, which is an array of entries (k,p), one for every ^j-edge-group present 
in Gi- Here k is the number of the corresponding ^^-node-group Vki, and p is a 
pointer to the start of the p^-edge-group in Bi. The entries of Ai are sorted by 
their first components. The indexes Ai, A2, . . . , A^ occupy consecutive locations 
of one of two arrays A' and A" of size 0 (V). Finally, every p^-edge-group 5 of a 
level Gi will contain an index of the pi_i-edge-groups it presently contains. This 
index consists of the first and last edge of each pi_i-edge-group 7 together with 
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pointers to the first and last locations of the rest of 7 . These edges are kept at 
the front of g, in the same order as the g^-i-edge-groups to which they belong. 

We now describe how each step of GetEdgeLists is performed. We assume 
that every query set S of nodes is sorted by assigned rank r(v), which can be 
ensured by sorting the initial query set before the first call. In line 1 of the 
algorithm, we find S' by simultaneously scanning S and Ai, using that, if {k,p) 
is an entry of A^, all nodes v G S for which kgi < r(v) < {k + l)gi will have 
Af{v) residing in the gredge-group pointed to by p (otherwise, Af{v) would have 
been found earlier in the recursion, i.e. for a smaller i, and v would have been 
removed from S). In short. S' is the subset of S not covered by the entries in Ai. 

In line 5, when a ^^-edge-group g is to be inserted into level Gi, the index 
of the gi_i-edge-groups of g is generated by scanning g, and g (now with the 
index) is appended to Bi. An entry for Ai is generated. When the for- loop in 
line 4 is finished, the set of new Ai entries are sorted by their first components 
and merged with the current Ai. Specifically, the merging writes Ai to A!' if Ai 
currently occupies A', and vice versa, implying that the location of the entire 
set of Ai’s alternates between A! and A!' for each call GetEdgeLists(S', 1). 

In line 6 , we scan S and the updated Ai to find the entries {k,p) pointing to 
the relevant (/j-edge-groups of the updated Bi . During the scan of a group g, we 
generate a pair {v,p) for each of the (/i_i-edge-groups 7 inside g that contains 
one or more nodes from S, where v is the first node in 7 and p is the pointer to g. 
These pairs are now sorted in reverse lexicographic order (the second component 
is most significant), so that the (/i-edge-groups can be accessed in the same order 
as they are located in Bi. For each such group g, we scan its index to find the 
relevant (/i_i-edge-groups and access these in the order of the index. Each gi-\- 
edge-group is removed from its location in Bi (leaving empty positions) and 
placed in the output set. We also remove its entry in the index of g. The I/O- 
bound of this process is the minimum I/O-bound of a scan of Bi and a scan of 
each of the moved (/i_i-edge-groups. 



3.2 Analysis 

In the following we analyze the number of I/Os performed by our cache-oblivious 
BFS-algorithm, assuming that the management of the layout of the BiS can be 
done efficiently (for this, see [9]). The underlying BFS-algorithm from [21] scans 
each BFS-level Li twice: once while constructing Lj+i and once while construct- 
ing Ti+ 2 , causing a total of OiyjB) I/Os for all lists Li. Edges extracted from 
the data structure storing the adjacency lists are sorted and scanned for filtering 
out duplicates and already discovered nodes, causing a total of 0(Sort(E)) I/Os. 

We now turn to the I/Os performed during queries of the data structure 
storing the adjacency lists. The cost for constructing the initial spanning tree 
is 0(ST(i?)); the Euler tour can be constructed in 0(Sort(E)) I/Os [4]. As- 
signing ranks to nodes and labeling the edges with the assigned ranks requires 
further 0(Sort(E)) I/Os. The total preprocessing of the data structure hence 
costs 0(ST(E) -I- Sort(E)) I/Os. 
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For each query from the basic BFS-algorithm, the query algorithm for the 
data structure accesses the Ai and Bi lists. We first consider the number of I/Os 
for handling the Ai lists. During a query, the algorithm scans each Ai list at 
most a constant number of times: to identify which g^-edge-groups to extract 
recursively from to merge Ai with new entries extracted recursively; and 

to identify the g^-i-edge-groups to extract from Bi. The number of distinct 
3 i-edge-groups is 2Vjgi. Each group is inserted into level Gi at most once. By 
Observation 1, when a g^-edge-group is inserted into level Gi, it will become 
part of an initial query set S within gi queries from the basic BFS-algorithm, 
that is, within the next gi BFS-levels; at this time, it will be removed from 
the structure. In particular, it will reside in level Gi for at most gi queries. 
We conclude that the total cost of scanning Ai during the complete run of the 
algorithm is 0{{2Vlgi) ■ gi/B), implying a total number of 0{h ■ V/B) I/Os for 
scanning all Ai lists. This bound holds because the ^^’s are stored in consecutive 
memory locations, which can be considered to be scanned in a single scan during 
a query from the basic BFS-algorithm. Since each node is part of exactly one 
query set of the basic BFS-algorithm, the total I/O cost for scanning the S sets 
during all recursive calls is also 0{h ■ VjB). 

We now bound the sorting cost caused during the recursive extraction of 
groups. The pointer to each ^i-edge-group participates in two sorting steps: 
When the group is moved from level z -I- 1 to level i, the pairs generated when 
scanning are sorted before accessing when the g^-edge-group has 

been extracted from the pointers to extracted groups are sorted before 

they are merged into Ai. We conclude that the total sorting cost is bounded 
by bort(2E/gi)) which is 0(Sort(E)), since gi is at least exponentially 

increasing for both of the two sequences considered. 

Finally, we need to bound the I/O cost of accessing the Bi lists. For each query 
of the basic BFS-algorithm, these will be accessed in the order Bh, Bh-i, . . . ,B\. 
Let t be any integer for which 1 < t < h. The cost of accessing Bt, . . . ,B\ during 
a query is bounded by the cost of scanning Bt, . . . ,B\. Since an edge in Bi can 
only remain in Bi for gi queries from the basic BFS-algorithm, we get a bound 
on the total I/O cost for Bi, . . . ,Bt of 9i ' ^/^)> which is 0{gt ■ E/B) 

since gt is at least exponentially increasing. To bound the cost of accessing 
Bh, . . . , Bt+i, we note that the number of I/Os for moving a (/i-edge-group list 
containing k edges from to Bi is bounded by 0(1 + k/B + gi+x/(giB)), 

where gi+ij gt is the bound of the size of the index of a ^i+i-edge-group. Since 
the number of (/i-edge-groups is bounded by 2Vjgi, the I/O cost for accessing 
Bh,... , Bt+i is bounded by 9i + E/B + (2V / g/) ■ gi+i/(g,_B)) = 

0(V/gt + h-E/B), since gi+i < gf holds for both of the two sequences considered 
(when 0 < e < 1). The total cost of accessing all Bi is, hence, 0(gt ■ E/B + 
V/gt + h- E/B), for all 1 < t < h. Adding all the bounds above gives a bound 
of 0(ST(if) -I- Sort(E) -|- gt ■ E/B + V/gt + h ■ E/B) on the total number of of 
I/Os incurred by the query algorithm, for all 1 < t < h. 

For gi = 2*, we select gt = 0(yJV B / E) and have h = 6>(logE), so the I/O- 
bound becomes 0(ST(if) -|- Sort(E) -|- ^ log V + y^VE/B). For gi = 0(2^^^^'^'), 
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we select the smallest gt > ^/vITJE, i.e. gt < \[VBJE ^ , and have h = 

6>(i loglog V), so the I/O-bound becomes 0(ST(iil) -|-Sort(i?) -h § • ^ - log log V + 
^VE/B ■ ^VB/e"). 
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Abstract. In recent years a large number I/0-efIicient algorithms have 
been developed for fundamental planar graph problems. Most of these 
algorithms rely on the existence of small planar separators as well as an 
0(sort(A)) I/O algorithm for computing a partition of a planar graph 
based on such separators, where 0(sort(A)) is the number of I/Os needed 
to sort N elements. 

In this paper we simplify and unify several of the known planar graph 
results by developing linear I/O algorithms for the fundamental single- 
source shortest path, breadth-first search and topological sorting prob- 
lems on planar directed acyclic graphs, provided that a partition is given; 
thus our results give 0(sort(A)) I/Os algorithms for the three problems. 
While algorithms for all these problems were already known, the previ- 
ous algorithms are all considerably more complicated than our algorithms 
and use ©(sort(A)) I/Os even if a partition is known. Unlike the previ- 
ous algorithm, our topological sorting algorithm is simple enough to be 
of practical interest. 



1 Introduction 

Recently, external memory graph algorithms have received considerable atten- 
tion because massive graphs arise naturally in a number of applications such as 
transportation networks and geographic information systems (GIS). When work- 
ing with massive graphs, the 1/ 0-communication, and not the internal memory 
computation, is often the bottleneck. Efficient external-memory (or I/O-efficient) 
algorithms can thus lead to considerable runtime improvements. 

The need for solving fundamental graph problems (such as topological sort- 
ing) on planar graphs often appear in e.g. GIS. For example, in an application 
such as flow modeling on grid terrain models, each cell in the terrain model is 
assigned a flow direction to one of its neighbors such that the resulting graph is 
planar and acyclic. To trace the amount of flow through each cell of the terrain 
one then needs to topologically sort this graph [5]; external memory algorithms 
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are needed to do so efficiently, as modern terrain models — and thus the manip- 
ulated planar graphs — are often massive since projects such as NASAs EOS [1] 
and Space Radar Topography Mission [16] have acquired terrabytes of terrain 
data in recent years. 

Even though a large number of I/O-efficient graph algorithms have been 
developed, a number of fundamental problems on general graphs still remain 
open. For planar graphs, on the other hand, significant progress has been made. 
A large number of fundamental problems on undirected planar graphs have been 
solved I/O-efficiently [3,4,9,14] and recently several fundamental problems have 
also been solved for directed planar graphs [6,7]. Most of these algorithms are 
based on the existence of small planar separators. 

In this paper we simplify and unify several of the directed planar graph results 
by developing linear I/O algorithms for the fundamental single-source shortest 
path, breadth-first search and topological sorting problems on planar directed 
acyclic graphs, provided that a partition is given. Our algorithms rely on a set of 
reductions using the partition, which exploit the acyclicity in important ways. 
Previous algorithms all use more than linear I/Os even if a separation is known; 
they are also considerably more complicated than our new algorithms. 



1.1 Problem Statement 

Let G = (V,E) be a directed acyclic graph (DAG). We say that G is planar if 
it can be embedded in the plane such that no edges intersect. The topological 
sorting problem is the problem of computing an order on the vertices of G so 
that for any edge (u, v) G E, vertex u comes before vertex v in this order; a 
graph can be topologically sorted if and only if it is acyclic. If G is weighted, 
the single-source shortest path (SSSP) problem is the problem of finding the 
shortest paths from a given source vertex s in G to all other vertices in G, where 
the length of a path is defined as the sum of the weights of the edges on the 
path. The breadth-first search (BFS) problem is equivalent to SSSP where all 
edges have weight one. 

1.2 I/O-Model and Previous Results 

We will be working in the standard two- level I/O model [2], where M is the 
number of vertices that can fit into internal memory, and B is the number of 
vertices that can fit into a disk block, with M < N and 1 < R < y/Mf An I/O is 
the operation of transferring a block of data between main memory and disk, and 
the complexity of an algorithm is measured in terms of the number of disk blocks 
and I/Os it uses to solve a problem. The minimal number of I/Os needed to read 
N input elements (the “linear bound”) is obviously scan(A^) = 0{N/B). The 
number of I/Os needed to sort N elements is sort(A^) = 0{^logj^^g N/B) [2]. 

^ Some algorithms, like the planar separator algorithm of [14], make the stronger but 
realistic assumption that M > Ig^ B. The algorithms described in this paper 
make this assumption indirectly as they rely on planar separators. 
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For all realistic values of N, B, and M, scan(A^) < sort(A^) <C N, and the 
difference in running time between an algorithm performing N I/Os and one 
performing scan(fV) or sort(A^) I/Os can be very significant. 

Despite considerable efforts, many fundamental problems on general graphs 
remain open; refer to the surveys in [15,17] and the references therein. On general 
digraphs the best known algorithm for SSSP, as well the best algorithms for 
the simpler BFS and DFS traversal problems, use 0(min{(|P| + • log \V\ + 

sort(|i?|), \V\ + I/Os [8,9,12] . Thus all these algorithms use fi{V) I/Os, 

while the lower bound for the number of I/Os required to solve most graph 
problems is l7(min{P, sort(P)}) (which, in practice is l7(sort(P))). As a result, 
improved algorithms have recently been developed for special classes of graphs. 
On planar digraphs SSSP, BFS, ear decomposition, as well as topological sorting 
of an acyclic graph have been solved in 0(sort(A^)) I/Os [6], while DFS can be 
solved in 0(sort(A^) log I/Os [7]. All these algorithms are based on I/O- 
efRcient reductions [3,4, 6, 7] and on an 0(sort(iV)) I/O planar graph separator 
algorithm [14]. 





(a) 



(b) 



Fig. 1. (a) Partition of G into clusters Gi (boxed) and separators vertices Vs (black), 
(b) One cluster Gi in the partition and its adjacent boundary sets. For simplicity the 
direction of the edges is not shown. 



Almost all of the above mentioned I/O-efficient algorithms for planar graphs 
utilize the existence of small separators. An f (N) -separator of an vertex graph 

G = (V,E) is a subset Vs of the vertices V of size f{N), such that the removal 
of Vs partitions G into two subgraphs Gi and G 2 of size at most ^ . Lipton and 
Tarjan [13] showed that any planar graph has an 0(-\//V) -separator. Using this 
result recursively, Frederickson [11] showed that for any parameter R G [l,Af], 
there exists a subset Vs of 0{N/'/R) vertices, such that the removal of Vs 
partitions G into 0{N / R) subgraphs Gi of size 0{R), where (the vertices in) 
each Gi is (are) adjacent to 0{'/R) vertices of Vs- We call such a partitioning 
an R-partition. The vertices in Vs are called the separator vertices and each of 
the graphs Gi a cluster. The set of separator vertices adjacent to Gi are called 
the boundary vertices dGi (or simply the boundary) of Gi- We use Gi to denote 
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the graph consisting of Gi, dGi and the subset of edges of E connecting Gi and 
dGi- Refer to Fig. 1(a). The set of separator vertices can be partitioned into 
maximal subsets so that the vertices in each subset are adjacent to the same set 
of clusters. These sets are called the boundary sets of the partition. If the graph 
has bounded degree (which can be ensured for planar graphs using a simple 
transformation [11]), Frederickson showed that there exists an i?-partition with 
only 0{N/R) boundary sets. Refer to Fig. 1(b). Maheshwari and Zeh showed 
how to compute such an i?-partition in 0(sort(iV)) I/Os, provided that M > 
B'^log^ B [14]. 

1.3 Our Results 

In this paper we simplify and unify several of the known planar graph results by 
developing 0(scan(A^)) I/O algorithms for the fundamental single-source short- 
est path, breadth-first search and topological sorting problems on planar DAGs, 
provided that a R^-partition is given. Since such a partition can be computed in 
0(sort(A^)) I/Os, our results give new 0(sort(7V)) I/Os algorithms for the three 
problems. While such algorithms were already known, the previous algorithms 
are all considerably more complicated than our algorithms and use 6*(sort(A^)) 
I/Os even if a partition of the graph is known; however the previous BFS and 
SSSP algorithms work on general planar digraphs. Especially the previous topo- 
logical sorting algorithm due to Arge, Toma, and Zeh [6], which utilizes SSSP 
and computation of a directed ear decomposition of a strongly connected di- 
rected planar graph, is much more complex than our algorithm; we show that 
given a R^-partition topological sorting of an N vertex planar DAG can very 
easily be reduced in 0(scan(A^)) I/Os to topological sorting of a (non planar) 
DAG with 0{N/B) vertices and 0{N) edges, which in turn can easily be solved 
in 0(scan(A^)) I/Os using a slightly modified version of a simple internal memory 
algorithm — unlike the previous algorithm, our algorithm is simple enough to be 
of practical interest. Our results unify many of the previous results by showing 
that computing a good partition is the hard part of most planar DAG problems. 

2 Topologically Sorting a Planar DAG 

In this section we show how, given a R^-partition of a planar DAG G with N 
vertices, we can topologically sort G in 0(scan(A^)) I/Os. Recall that a Re- 
partition consist of 0{-^) clusters Gi of size 0{B'^) = 0{M), each adjacent 
to 0{B) boundary vertices dGi. We assume without loss of generality that G 
has bounded degree, and that the R^-partition has 0{-^) boundary sets (sets 
of separator vertices adjacent to the same constant-sized set of clusters). Fur- 
thermore, we assume that G is given in edge-list representation, that is, as a 
list of edges with edges incident to each vertex (both incoming and outgoing) 
appearing consecutively in the list;^ we assume that the edges incident to ver- 
tices in each cluster Gi are stored consecutively, and so are the edges incident to 

^ Any (reasonable) representation can be transformed into this representation in 
0(sort(A)) I/Os. 
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each boundary set. Note that this means that the graph Gi induced by Gi and 
dGi can be loaded into main memory in 0(B) I/Os, and that the 0{B) edges 
incident to each boundary set can be loaded in 0(1) I/Os. 

Our algorithm consists of a series of reductions, each of which can be per- 
formed in 0(scan(A^)) I/Os: First we reduce the problem of topologically sorting 
the DAG G to the problem of computing longest paths in a DAG O'* with -|- 1 
vertices and 0{N) edges. We then show how to reduce this problem, using the 
i?^-partition of G, to computing longest paths in a weighted DAG G^ with 
0{N/B) vertices and 0{N) edges. This problem can in turn be reduced to com- 
puting a topological sorting of the vertices in G^, which we finally are able to 
solve efficiently directly because of the reduced number of vertices (as well as 
properties of the i?^-partition of G) . Note that since computing longest paths is 
NP-complete on general graphs [10], it is somewhat surprising that we are able to 
topologically sort G by reducing the problem to computing longest paths. Note 
also that the first two and last two reductions can easily be combined, resulting 
in a relatively simple overall algorithm. Below we describe each of these steps 
(Lemmas 2, 6, 7, and 8) and thus prove the following: 

Theorem 1. Given a B"^ -partition of a planar DAG G in edge-list representa- 
tion, G can he topologically sorted G in 0{scan{N)) I/Os. 

2.1 Reducing Topological Sorting of G to Longest Paths in G® 

Our first reduction simply consists of introducing a new source (indegree-zero) 
vertex s and adding edges to all indegree-zero vertices in G. We call the resulting 
graph G®. Note that G® is still a DAG but not necessarily planar. Now for each 
vertex v let A[u] be the length (number of edges) of the longest path from s 
to V. We can easily show that computing longest paths in G® corresponds to 
topologically sorting G: 

Lemma 1. An ordering of the vertices in G® by longest-path lengths A[v] is a 
topological ordering of the vertices in G. 

Proof. We must prove that A[m] < A[u] for any (u,v) € E. Assume by contra- 
diction that \[u] > A[uj. Let p be the longest path from s to u of length A[uj. 
Then the path p' obtained by adding the edge (u, v) to p is a path from s to v. 
Since this path has length A[m] -I- 1 we have A[u] > A[u] -I- 1, which implies that 
A[u] < A[uj. □ 

Since we can add s to G and compute the 0{N) extra edges in a scan of G 
we have obtained the following: 

Lemma 2. Topologically sorting G can he reduced in 0{scan{N)) I/Os to com- 
puting longest-path lengths from s to all vertices in G®. 
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2.2 Reducing Longest Paths in G® to Longest Paths in G^ 

In our second reduction we utilize the given B^-partition of G, which we assume 
is stored implicitly in G®. In fact, we assume that G® is represented as the list 
of edges in G (ordered as discussed in the beginning of this section), and that 
the edges incident to s are represented implicitly by indegree-zero vertices, that 
is, we do not store s or its incident edges explicitly. Therefore, even though G® 
is not planar, we will in the following refer to the B^-partition of G®. 

The reduced graph is defined as follows: The vertices of G^ consist of s 
and the separator vertices in the B^-partition of G®. To define the edges of G^ 
we first consider each cluster Gi in turn and, for every pair of vertices u and v 
on the boundary dGi of Gi, we add an edge (u,v) if there is a path from u to 
V in Gi (that is, in the graph induced by Gi U dGi); the edge (u,v) has weight 
equal to the length of the longest path from u to v in Gi. In addition, for each 
vertex u € Gi with an edge (s,u) in G® (i.e. indegree-zero vertices in G) we also 
add edges (s,u) for all v G dGi with a path from m to v in G^; the edge (s,?^) 
has weight equal to one plus the length of the longest path from u to v in G^. 
Finally, we add all edges between two separator vertices in G®; these edges have 
weight one. 

G^ has 0 {N/B) vertices since the number of separator vertices in the B^- 
partition of G® is 0 {N/B). Since each of the O(^) clusters Gi has 0 {B) bound- 
ary vertices dGi, G^ has 0 {B^) edges between vertices in dGi, as well as 0 {B) 
edges between s and vertices in dGi. Thus G^ has 0 {N) edges in total. 

In order to prove that the lengths (weights) of the longest paths to separator 
vertices are the same in G® and G^, we need the following general lemma: 

Lemma 3. Subpaths of longest paths in a DAG are longest paths. 

Proof. Let p be the longest path between two vertices u and v in a DAG G. Let 
ui and U2 be two vertices on p and let PuiU 2 be the subpath of p between ui 
and U2. By contradiction, assume that PuiU 2 is not the longest path from ui to 
U2 in G, that is, there exists a path p' from ui to U2 that is longer than PuiU2- 
Since G is a DAG, p' cannot contain a vertex that appears on p before ui or 
after U2 (if it contained such a vertex w, say, after U2, there would be a cycle in 
G containing U2 and w). Thus we can replace PuiU 2 i’^ P by p' and obtain a path 
from M to V that is longer than p. This contradicts that p is the longest path 
from u to V. □ 

Lemma 4. For each separator vertex v in G^, the longest-path length Afl[w] 
between s and v in G^ is equal to the longest-path length A[u] in G®. 

Proof. Gonsider the longest path p from s to a separator vertex in G®; let ui 
and U2 be two consecutive separator vertices on p on the boundary of the same 
cluster Gp, that is, Ui,U2 G dGi and all vertices between ui and U2 on p are in 
Gp refer to Fig. 2(a). Since there is a path from ui to U2 in Gi, there must be an 
edge {u\,U2) in G^; this in particular means that there is a path from s to r: in 
G^. By definition, the weight of edge (mi, M2) in G^ is equal to the length of the 
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Fig. 2. (a) Lemma 4. (b) Lemma 5. 



longest path from ui to U 2 in Gi. By Lemma 3 the length of this path must be 
the same as the length of the subpath of p from m to U 2 - Thus Afl[w] = A[w]. □ 

If we can compute the longest-paths lengths to all vertices v in G^, and 
thus the longest-path lengths A[t;] to all separator vertices v in G^ (Lemma 4), 
we can compute the longest-path lengths to the remaining vertices in G"* using 
the following lemma: 

Lemma 5. Let u be a vertex in the cluster Gi of G^ and let A^(v,u) denote 
the length of the longest path from a boundary vertex v € dGi to u in Gi. The 
length of the longest path from s to u in G® is 

A[m] = max{l, {^b] + "^ 07 ( 1 ’)'*^)}} 

Proof. Let p be the longest path from s to t 6 in G® . Either p has length one (edge 
(s, u)) or it must contain a vertex on the boundary dGi of Gi. Consider the last 
such vertex v G dGi and let Psv and denote the subpath of p from s to u and 

from V to u, respectively. Refer to Fig. 2(b). By Lemma 3, since p is the longest 
path to u and G® is a DAG, Psv must be the longest path from s to w in G® and 
Pyu the longest path from v to u in Gi. If follows that we can find the length of 
p (that is, A[u]) by evaluating A[w] -I- \-^{v,u) for each vertex v on dGi. □ 

We can easily compute G^ from G® I/O-efficiently as follows: We load each 
of the 0{^) graphs Gi induced by Gi and dGi into main memory in turn, use 
an internal memory algorithm to compute the relevant O(B^) edges between the 
0{B) boundary vertices dGi, and write these edges back to disk. During this 
process we also compute the 0{N/B) edges incident to s and retain the 0{N/B) 
edges in G® between the separator vertices. In total we use 0(;^-i? -I- scan (IV)) = 
0(scan(A^)) I/Os. In the subsequent subsections we will assume that G^ is 
represented similarly to the way G® (and G) is represented, that is, as a list of 
edges such that all edges incident to each vertex are stored consecutively and, 
furthermore, such that edges incident to the vertices in each boundary set of G® 
are stored consecutively (even though vertices in Gi are removed from G^ we will 
still refer to the boundary sets of the vertices of G^). We can easily produce this 
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representation in a simple scan of the produced edges using another 0(scan(A^)) 
I/Os. 

After having computed the longest-paths lengths Afl[w] to all vertices v in G^, 
we can easily compute the longest-path lengths to the remaining vertices in G® 
in 0(scan(iV)) I/Os using Lemma 5. We load each cluster Gi and its boundary 
vertices dGi (now marked with longest path lengths) in turn, compute A^(u, u) 
for each pair of vertices v S dGi and u G Gi using an internal memory algorithm 
and thus computing A[u], and finally write all the longest-path lengths back to 
disk. Overall we have proved the following: 

Lemma 6. Computing longest-path lengths for all vertices in the DAG G® with 
TV -1-1 vertices and 0{N) edges can he reduced in 0{scan{N)) I/Os to computing 
longest-path lengths in the DAG G^ with 0{N/B) vertices and 0{N) edges. 

2.3 Reducing Longest Paths in to Topologically Sorting of G^ 

Computing longest-path lengths in G^ can easily be reduced to topologically 
sorting G^ (utilizing the ideas of a standard linear time algorithm for com- 
puting shortest paths in a DAG [10]). The basic observation is that if the 
last edge on the longest path p from s to a vertex u is (v,u), then the part 
of p from s to V is the longest path to v (Lemma 3). This means that if 
(ui, m), . . . , (vk, u) are the k in-edges of u and w{vi, u) the weight of edge (uj, u), 
then A/{[m] = ma,x{Xn[vi] -I w{vi,u), Xr[v 2 ] -I w{v 2 ,u), . . . , Xji[vk] -I w{vk,u)}. 
Thus if we process and compute the longest paths of the vertices of G^ in topo- 
logical order, we know that when processing vertex u we have already computed 
the longest paths to all in-neighbors Vi. We can therefore easily compute the 
longest path to m in a simple scan of its in-edges. 

To implement the above algorithm I/O-efficiently, given G^ in topological 
order, we maintain a list L of the longest path lengths A[m] to all vertices u in 
G^ such that vertices in the same boundary set are stored consecutively. Recall 
that a boundary set is defined as a maximal subset of boundary vertices in G® 
(and thus in G^) that are adjacent to the same set of clusters Gi in G®, and 
that edges incident to vertices in the same boundary set are stored consecutively 
in our representation of G^. As we process a vertex u in the topological order, 
we scan its 0{B) in-edges from the representation of G^ and load the longest- 
paths lengths of all its 0{B) in-neighbors Vi from L in order to compute = 
max{Ai?[ui] -I- w(ui, u), A_r[u 2 ] + w{v 2 ,u ), . . . , Afl[vfc] -I- w{vk,u)}; then we write 
A/{[u] back to L. Since G^ has 0{N/B) vertices and 0{N) edges, we use 0(^ -I- 
scan(A^)) = 0(scan(A^)) I/Os to retrieve all vertices and scan their in-edges. To 
see that the 0{N) accesses to L can also be performed in 0(scan(A^)) I/Os, 
recall that each boundary set is of size 0{B) and is accessed once by each of its 
adjacent vertices in each of its adjacent clusters, that is, 0{B) times. Since the 
vertices in each boundary set are stored consecutively in L they can be loaded 
in 0(1) I/Os. Since there are 0{^) boundary sets, the total number of I/Os 
spent on accessing boundary sets from L is overall 0{B ■ = 0(scan(iV)). We 

have obtained the following: 
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Lemma 7. Computing longest-path lengths in DAG with 0{N/B) vertices 
and 0{N) edges can be reduced in 0{scan{N)) I/Os to topologically sorting G^. 

2.4 Topologically Sorting 

Since G^ only has 0{N/B) vertices (and is obtained from a B^-partition) we 
can topologically sort it I/O-efficiently using a slightly modified version of a 
standard topological sort algorithm [10]. We first compute the in-degree of each 
vertex in G^. Next we number the vertices one at time while maintaining a list 
Q of indegree-zero vertices; initially the list contains only the source vertex s. 
We repeatedly remove and number an indegree-zero vertex v from Q; for each 
such vertex v we remove all edges of the form {v, u) from G^ {v’s out-edges) by 
decrementing the indegree of m in L. When the indegree of a vertex u becomes 
zero we insert it in Q. It is easy to see that this algorithm correctly topologically 
sorts G^. 

The above algorithm can be performed I/O-efficiently as follows: The initial 
indegree of all vertices can be computed in 0(scan(A^)) I/Os in a simple scan of 
the representation of G^. To number the vertices of G^ one-by-one efficiently, 
we again exploit the topology of the boundary sets of G^: we maintain the 
indegrees in a list L such that the degrees of vertices in the same boundary set 
are stored consecutively in L. When processing an (indegree-zero) vertex v we 
load V and all its 0{B) neighbors from L, scan through the (out) edge-list of v 
while decrementing the relevant indegrees and writing them back to L. In total 
there are 0{N) accesses, one for each edge, but as in Section 2.3 we can argue 
(using that there are only 0{N /B"^) boundary sets) that they are performed in 
0{N/B) I/Os. Thus we have obtained the following: 

Lemma 8. The DAG G^ with 0{N/B) vertices and 0{N) edges can be topo- 
logically sorted in 0{scan{N)) I/Os. 

3 BFS and SSSP on Planar DAGs 

Given a topological order of the vertices of a general acyclic graph, SSSP (and 
thus BFS) can easily be solved in 0(sort(iV)) I/Os using an I/O-efficient priority 
queue. In this section we describe how this bound can be improved to 0(scan(A^)) 
for planar DAGs if a i?^-partition of the graph is given. 

Our improved algorithm is essentially the same as our algorithm for comput- 
ing longest paths described in Section 2, but modified to compute shortest paths 
rather than longest paths. Let s in G be the source vertex for the SSSP (BFS) 
problem. We reduce computing SSSP on G to computing SSSP on a reduced 
graph G^, which we in turn reduce to computing a topological order on G^. As 
previously, the key of the algorithm is that all these reductions can be performed 
in 0(scan(iV)) I/Os and that the reduced graph G^ can be topologically sorted 
in 0(scan(iV)) I/Os (Lemma 8). 

The reduced graph G^ is defined as follows. The vertices of G^ consist of 
the source vertex s and the separator vertices in G. The edges are defined as in 
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Section 2.1, except that edge weights between vertices u, v on the boundary dGi 
of Gi correspond to shortest path lengths in Gi. The graph Gr is a DAG with 
0{N/B) vertices and 0{N) edges and can be computed in 0(scan(fV)) I/Os 
given a B^-partition of G. Using the same arguments as previously, it is easy 
to show that G^ preserves shortest paths in G, that is, that for any separator 
vertex v the length Sg(v) of the shortest path between s and in G is the same 
as the length 6qr{v) of the shortest path in G^. Given that we can compute 
shortest paths Sg{v) to all vertices in G^, we can then compute the shortest 
paths to the remaining vertices u in Gi as Sg{u) = min^g^g^ + %(«,«)}• 
This can be done using 0(scan(A^)) I/Os as in Section 2.2. Since G^ is a DAG, 
shortest paths in G^ can be computed in the same way as longest paths in G^ 
(Section 2.3) by processing vertices in topological order. Finally, a topological 
order of G^ can be computed in 0(scan(7V)) I/Os using Lemma 8. We have the 
following. 

Theorem 2. Given a -partition of a planar DAG G in edge-list representa- 
tion, BBS and SSSP can he solved in 0{scan{N)) I/Os on G. 

4 Conclusion and Open Problems 

In this paper we developed simple linear I/O algorithms for the single-source 
shortest path, breadth-first search and topological sorting problems on planar 
DAGs, provided that a B^-partition is given. Our algorithms rely on a set of 
reductions using the partition and essentially exploits the acyclicity of the graph. 
This leads to 0(sort(A^)) I/O algorithms for the three problems that are much 
simpler than the previously known algorithms. 

It remains an intriguing open problem to develop an 0(sort(A^)) directed 
DFS algorithm, on planar graphs as well as on general graphs. The results of 
this paper naturally open the question if acyclicity can be exploited to derive an 
efficient DFS algorithm for planar DAGs. 
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